string and return the port. I cleaned up and added error handling to
the code, but it's basically "alex"'s fix.
(extract_http_url): Rewrote this function to remove all the sscanf()
calls. It's much easier to just split on the path slash (if it's
present) and then strip the user name/password and port from the host
string. Less code, handles more cases!
this addition follow:
The patch implements a simple reverse proxy (with one funky extra
feature). It has all the regular features: mapping remote servers to local
namespace (ReversePath), disabling forward proxying (ReverseOnly) and HTTP
redirect rewriting (ReverseBaseURL).
The funky feature is this: You map Google to /google/ and the Google front
page opens up fine. Type in stuff and click "Google Search" and you'll get
an error from tinyproxy. Reason for this is that Google's form submits to
"/search" which unfortunately bypasses our /google/ mapping (if they'd
submit to "search" without the slash it would have worked ok). Turn on
ReverseMagic and it starts working....
ReverseMagic "hijacks" one cookie which it sends to the client browser.
This cookie contains the current reverse proxy path mapping (in the above
case /google/) so that even if the site uses absolute links the reverse
proxy still knows where to map the request.
And yes, it works. No, I've never seen this done before - I couldn't find
_any_ working OSS reverse proxies, and the commercial ones I've seen try
to parse the page and fix all links (in the above case changing "/search"
to "/google/search"). The problem with modifying the html is that it might
not be parsable (very common) or it might be encoded so that the proxy
can't read it (mod_gzip or likes).
Hope you like that patch. One caveat - I haven't coded with C in like
three years so my code might be a bit messy.... There shouldn't be any
security problems thou, but you never know. I did all the stuff out of my
memory without reading any RFC's, but I tested everything with Moz, Konq,
IE6, Links and Lynx and they all worked fine.
"ViaProxyName" directive. The "Via" HTTP header is _required_ by the
HTTP spec, so the code has been changed to always send the header.
However, including the proxy's host name could be considered a
security threat, so the "ViaProxyName" directive is used to set the
token sent in the "Via" header. If the directive is not enabled the
proxy's host name will be used.
standard HTTP port (80 or 443) append the port string to the host
header; otherwise, leave the host string with only the host's domain
name.
Replaced all occurrences of constant 80 and 443 with defines HTTP_PORT
and HTTP_PORT_SSL.
is used by the transparent proxy code. [Anatole Shaw]
(process_request): Fixed up the transparent proxy code so that
filtering can be done on the whole URL. [Anatole Shaw]
(pull_client_data): Added a bug fix for Internet Explorer (IE). IE
will leave an extra CR and LF after the data in an HTTP POST. The new
code will eat the extra bytes if they're present. Thanks to Yannick
Koehler for finding the bug and offering an explanation as to why it
was happening.
Changed all calls of connptr->remote_content_length to
connptr->content_length.server
username/password part from the host URI.
(extract_http_url), (extract_ssl_url): Use the new
strip_username_password function to remove any non-host information
from the URI.
since it's skipped by the caller before the URL is passed to this
function.
(process_request): Include code to handle proxy FTP requests as
well. This also lead to a bit of a cleanup in the calling conventions
of extract_http_url function. tinyproxy can handle both types of
resources by skipping the leading :// part.
tinyproxy. There is really no need for this code, since there are
perfectly good programs out there (like rinetd) which are designed for
TCP tunnelling. tinyproxy should be a good HTTP proxy, nothing more,
and nothing less; therefore, the tunnelling code is gone.
(get_all_headers): Instead of dropping duplicate headers when the "double CGI" situation occurs, tinyproxy will now drop _all_ the headers from the "inner" HTTP response.
and added the get_content_length() function.
The process_server_headers() function was rewritten to remove the
Connection header correctly, and also retrieve the Content-Length value.
This value is needed in the relay_connection() function since there are
some remote machines which do not properly close down the connection once
the body has been retrieved. Thanks to James Flemer for finding a test
case for this problem.
itself to find out all the changes. Changed the process_client_header()
function to use the hashmap and vector modules. I've made this change to
better handle the Connection header. The Connection header, it it's
present, lists all the headers which should _not_ be transmitted any
further along. An HTTP/1.1 proxy must respect this.
Other changes are basically cosmetic.
16 KB.)
Added the TUNNEL_CONFIGURED() macro to help with testing for the tunnel
support code.
Create the write_message() function to encapsulate the code which sends
the information to the file descriptor.
Moved the tunnel code into it's own function.
Ignore any blank lines when tinyproxy is expecting a request line.
Instead of sending the request line to the remote server in pieces,
tinyproxy nows sends it in once go. This was done to fix a problem with
some sites like www.heise.de.
Changed all calls to connptr->ssl to connptr->connect_method.
Changed all calls to connptr->send_message to
connptr->send_response_message.
Moved the call to Via header code to inside to the tests to handle if
tinyproxy is sending an error message (don't need to send any headers.)
emptied when either socket is closed. This should be better for the tunnel
connections.
Change the connect_to_upstream() function to better utilize the
establish_http_connection() function. Code re-use is cool. :)
since they're now used in other places.
Added support for a true upstream proxy connection. This involved some
rewriting of the handle_connection() function and some of the support
functions so that they do perform the domain filtering and anonymous
filtering while still connecting to the upstream proxy. I think the code
should be cleaned up further.
changed the parsing code from REGEX and uri.c to a simplier sscanf()
method. Also, include code to handle SSL connections, but that's not quite
working yet.
Changed any reference to log() to log_message().
Fixed a potential memory leak in process_method().
Removed redundant code and variables in relay_connection().