in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.
additionally the code added in 0b9a74c290
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
introduced in 88153e944f.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.
this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
* check return values of memory allocation and abort gracefully
in out-of-memory situations
* use sblist (linear dynamic array) instead of linked list
- this removes one pointer per filter rule
- removes need to manually allocate/free every single list item
(instead block allocation is used)
- simplifies code
* remove storage of (unused) input rule
- removes one char* pointer per filter rule
- removes storage of the raw bytes of each filter rule
* add line number to display on out-of-memory/invalid regex situation
* replace duplicate filter_domain()/filter_host() code with a single
function filter_run()
- reduces code size and management effort
with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.
the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gzcloses#20
the timeout option set by the config file wasn't respected at all
so it could happen that connections became stale and were never released,
which eventually caused tinyproxy to hit the limit of open connections and
never accepting new ones.
addresses #274
unlike other functions called from the config parser code,
anonymous_insert() accesses the global config variable rather than
passing it as an argument. however the global variable is only set
after successful loading of the entire config.
we fix this by adding a conf argument to each anonymous_* function,
passing the global pointer in calls done from outside the config
parser.
fixes#292
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.
unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.
right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
it is quite easy to bring down a proxy server by forcing it to make
connections to one of its own ports, because this will result in an endless
loop spawning more and more connections, until all available fds are exhausted.
since there's a potentially infinite number of potential DNS/ip addresses
resolving to the proxy, it is impossible to detect an endless loop by simply
looking at the destination ip address and port.
what *is* possible though is to record the ip/port tuples assigned to outgoing
connections, and then compare them against new incoming connections. if they
match, the sender was the proxy itself and therefore needs to reject that
connection.
fixes#199.
tinyproxy used to do a full hostname resolution whenever a new client
connection happened, which could cause very long delays (as reported in #198).
there's only a single place/scenario that actually requires a hostname, and
that is when an Allow/Deny rule exists for a hostname or domain, rather than
a raw IP address. since it is very likely this feature is not very widely used,
it makes absolute sense to only do the costly resolution when it is unavoidable.
RFC 1929 specifies that the user/pass auth subnegotation repurposes the version
field for the version of that specification, which is 1, not 5.
however there's quite a good deal of software out there which got it wrong and
replies with version 5 to a successful authentication, so let's just accept both
forms - other socks5 client programs like curl do the same.
closes#172
just like the rest of the socks code, this was stolen from
proxychains-ng, of which i'm happen to be the maintainer of,
so it's not an issue (the licenses are identical, too).
using the "BasicAuth" keyword in tinyproxy.conf.
base64 code was written by myself and taken from my own library "libulz".
for this purpose it is relicensed under the usual terms of the tinyproxy
license.
original patch submitted in 2006 to debian mailing list:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=392848%29#12
this version was rebased to git and updated by Russ Dill <russ.dill@gmail.com>
in 2015 (the original patch used a different config file format).
as discussed in #40.
commit message by @rofl0r.
This should make hash processing generally faster.
There is a treadeoff between memory footprint and
speed of processing. 10 KB instead of 1.2 KB of
hash table per process should not be a huge problem
even on very limited current systems.
Who really needs to stick to 32 buckets could
recompile. We could also think about making
this configurable at some point.
Signed-off-by: Michael Adam <obnox@samba.org>
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the server fd.
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the client fd.
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Use extract_url instead of the old extract_ssl_url:
extract_url is generic and handles ipv6 literal addresses correctly.
Signed-off-by: Michael Adam <obnox@samba.org>
There is in fact nothing http-specific any more about this function, hence
the rename. The input has been stripped of the <proto>:// header anyways.
This in preparation of fixing bug BB#106: ssl fails with literal ipv6 addrs.
Signed-off-by: Michael Adam <obnox@samba.org>
When removing the '[' and ']' characers from the ipv6 literal address, make sure
the pointer that is later free'd stays a malloced pointer by memmoving the
string one place left.
Signed-off-by: Michael Adam <obnox@samba.org>
Pass a pointer to a char pointer to do_transparent_proxy so the reassembled URL
will actually end up back in the caller where it is needed for filtering
decisions. This fixes the problem that a tinyproxy configured with the
transparent proxy functionality and "FilterURLs Yes" would filter on everything
but the domain.
Signed-off-by: daniel.egger@sphairon.com
Signed-off-by: Michael Adam <obnox@samba.org>
https://www.banu.com/bugzilla/show_bug.cgi?id=55
This is achieved by streamlining handle_connection, adding
a common cleanup-and-exit poing ("done") and a common
failure exit point ("fail") that reads any pending data
from the client fd first before trying to send back
data (error page or stats page).
The new function get_request_entity that is used here,
does not honour any content-length header. It just calls
select on the client-fd and gets any data that is there
to read.
Michael