tested with 32K acl rules, generated by
for x in `seq 128` ; do for y in `seq 255` ; do \
echo "Deny 10.$x.$y.0/24" ; done ; done
after loading the config (which is dogslow too), tinyproxy
required 9.5 seconds for the acl check on every request.
after switching the list implementation to sblist, a request
with the full acl check now takes only 0.025 seconds.
the time spent for loading the config file is identical for both
list implementations, roughly 30 seconds.
(in a previous test, 65K acl rules were generated, but every
connection required almost 2 minutes to crunch through the list...)
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
the code wrongly processed the site_spec (here: domain) parameter
only when PT_TYPE == PT_NONE.
re-arranged code to process it correctly whenever passed.
additionally the mask is now also applied to the passed subnet/ip,
so a site_spec like 127.0.0.1/8 is converted into 127.0.0.0/8.
also the case where inet_aton fails now produces a proper error
message.
note that the code still doesn't process ipv6 addresses and mask.
to support it, we should use the existing code in acl.c and refactor
it so it can be used from both call sites.
closes#83closes#165
previously, in order to detect and insert {variables} into error/stats
templates, tinyproxy iterated char-by-char over the input file, and would
try to parse anything inside {} pairs and treat it like a variable name.
this breaks CSS, and additionally it's dog slow as tinyproxy wrote every
single character to the client via a write syscall.
now we process line-by-line, and inspect all matches of the regex
\{[a-z]{1,32}\}. if the contents of the regex are a known variable name,
substitution is taking place. if not, the contents are passed as-is to
the client. also the chunks before and after matches are written in
a single syscall.
closes#108
it's quite unexpected for an application running foreground in a
terminal to keep running when the terminal is closed.
also in such a case (if file logging is disabled) there's no way to
see what's happening to the proxy.
inet_ntoa() uses a static buffer and is therefore not threadsafe.
additionally it has been deprecated by POSIX.
by using inet_ntop() instead the code has been made ipv6 aware.
note that this codepath was only entered in the unlikely event that
no hosts header was being passed to the proxy, i.e. pre-HTTP/1.1.
* check return values of memory allocation and abort gracefully
in out-of-memory situations
* use sblist (linear dynamic array) instead of linked list
- this removes one pointer per filter rule
- removes need to manually allocate/free every single list item
(instead block allocation is used)
- simplifies code
* remove storage of (unused) input rule
- removes one char* pointer per filter rule
- removes storage of the raw bytes of each filter rule
* add line number to display on out-of-memory/invalid regex situation
* replace duplicate filter_domain()/filter_host() code with a single
function filter_run()
- reduces code size and management effort
with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.
the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gzcloses#20
the timeout option set by the config file wasn't respected at all
so it could happen that connections became stale and were never released,
which eventually caused tinyproxy to hit the limit of open connections and
never accepting new ones.
addresses #274
getsockname() requires addrlen to be set to the size of the sockaddr struct
passed as the addr, and a check whether the returned addrlen exceeds the
initially passed size (to determine whether the address returned is truncated).
with a request like "GET /\r\n\r\n" where length is 0 this caused the code
to assume success and use the values of the uninitialized sockaddr struct.
unlike other functions called from the config parser code,
anonymous_insert() accesses the global config variable rather than
passing it as an argument. however the global variable is only set
after successful loading of the entire config.
we fix this by adding a conf argument to each anonymous_* function,
passing the global pointer in calls done from outside the config
parser.
fixes#292
previously, default values were stored once into a static struct,
then on each reload item by item copied manually into a "new"
config struct.
this has proven to be errorprone, as additions in one of the 2
locations were not propagated to the second one, apart from
being simply a lot of gratuitous code.
we now simply load the default values directly into the config
struct to be used on each reload.
closes#283
as a side effect of not updating the config pointer when loading
the config file fails, the "FIXME" level comment to take appropriate
action in that case has been removed. the only issue remaining
when receiving a SIGHUP and encountering a malformed config file would
now be the case that output to syslog/logfile won't be resumed, if
initially so configured.
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.
unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.
right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
if daemon mode is used and neither logfile nor syslog options specified,
this is clearly a misconfiguration issue. don't try to be smart and work
around that, so less global state information is required.
also, this case is already checked for in main.c:334.
it is quite easy to bring down a proxy server by forcing it to make
connections to one of its own ports, because this will result in an endless
loop spawning more and more connections, until all available fds are exhausted.
since there's a potentially infinite number of potential DNS/ip addresses
resolving to the proxy, it is impossible to detect an endless loop by simply
looking at the destination ip address and port.
what *is* possible though is to record the ip/port tuples assigned to outgoing
connections, and then compare them against new incoming connections. if they
match, the sender was the proxy itself and therefore needs to reject that
connection.
fixes#199.
tinyproxy used to do a full hostname resolution whenever a new client
connection happened, which could cause very long delays (as reported in #198).
there's only a single place/scenario that actually requires a hostname, and
that is when an Allow/Deny rule exists for a hostname or domain, rather than
a raw IP address. since it is very likely this feature is not very widely used,
it makes absolute sense to only do the costly resolution when it is unavoidable.
since the write syscall is used instead of stdio, accesses have been
safe already, but it's better to use a mutex anyway to prevent out-
of-order writes.
if we don't handle these gracefully, pretty much every existing config
file will fail with an error, which is probably not very friendly.
the obsoleted config items can be made hard errors after the next
release.
the existing codebase used an elaborate and complex approach for
its parallelism:
5 different config file options, namely
- MaxClients
- MinSpareServers
- MaxSpareServers
- StartServers
- MaxRequestsPerChild
were used to steer how (and how many) parallel processes tinyproxy
would spin up at start, how many processes at each point needed to
be idle, etc.
it seems all preforked processes would listen on the server port
and compete with each other about who would get assigned the new
incoming connections.
since some data needs to be shared across those processes, a half-
baked "shared memory" implementation was provided for this purpose.
that implementation used to use files in the filesystem, and since
it had a big FIXME comment, the author was well aware of how hackish
that approach was.
this entire complexity is now removed. the main thread enters
a loop which polls on the listening fds, then spins up a new
thread per connection, until the maximum number of connections
(MaxClients) is hit. this is the only of the 5 config options
left after this cleanup. since threads share the same address space,
the code necessary for shared memory access has been removed.
this means that the other 4 mentioned config option will now
produce a parse error, when encountered.
currently each thread uses a hardcoded default of 256KB per thread
for the thread stack size, which is quite lavish and should be
sufficient for even the worst C libraries, but people may want
to tweak this value to the bare minimum, thus we may provide a new
config option for this purpose in the future.
i suspect that on heavily optimized C libraries such a musl, a
stack size of 8-16 KB per thread could be sufficient.
since the existing list implementation in vector.c did not provide
a way to remove a single item from an existing list, i added my
own list implementation from my libulz library which offers this
functionality, rather than trying to add an ad-hoc, and perhaps
buggy implementation to the vector_t list code. the sblist
code is contained in an 80 line C file and as simple as it can get,
while offering good performance and is proven bugfree due to years
of use in other projects.
The new code skips leading whitespaces before removing trailing
whitespaces and comments.
Without doing this, lines with leading whitespace are treated like empty
lines (i.e. they are ignored).
it was reported that because the fdset was only initialized once,
tinyproxy would fail to properly listen on more than one interface.
closes#214closes#127
RFC 1929 specifies that the user/pass auth subnegotation repurposes the version
field for the version of that specification, which is 1, not 5.
however there's quite a good deal of software out there which got it wrong and
replies with version 5 to a successful authentication, so let's just accept both
forms - other socks5 client programs like curl do the same.
closes#172
sbin/ is meant for programs only usable by root, but in tinyproxy's
case, regular users can and *should* use tinyproxy; meaning it is
preferable from a security PoV to use tinyproxy as regular user.
closes#15 for real.
the previous patch that was merged[0] was halfbaked and only removed
the warning part of the original patch from openwrt[1], but didn't
actually activate bind support. further it invoked UB by removing
the return value from the function, if transparent proxy support was
compiled in.
[0]: d97d486d53
[1]: 7c01da4a72
just like the rest of the socks code, this was stolen from
proxychains-ng, of which i'm happen to be the maintainer of,
so it's not an issue (the licenses are identical, too).
tinyproxy uses a curious mechanism to log those early messages
that result from parsing the config file before the logging mechanism
has been properly set up yet by finishing parsing of the config file:
those early messages are written into a memory buffer and then
are printed later on. this slipped my attention when making it possible
to log to stdout in ccbbb81a.
using the "BasicAuth" keyword in tinyproxy.conf.
base64 code was written by myself and taken from my own library "libulz".
for this purpose it is relicensed under the usual terms of the tinyproxy
license.
original patch submitted in 2006 to debian mailing list:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=392848%29#12
this version was rebased to git and updated by Russ Dill <russ.dill@gmail.com>
in 2015 (the original patch used a different config file format).
as discussed in #40.
commit message by @rofl0r.
if using one of unsigned or signed char for the function prototype, one
gets nasty warnings when using it with the other type. the only proper
solution is to put void* into the prototype, and then specialize the pointer
inside the function using an automatic variable.
for exactly this reason, libc functions like read(), write(), etc use void*
too.
some users want to run tinyproxy on an as-needed basis in a terminal,
without setting it up permanently to run as a daemon/service.
in such use case, it is very annoying that tinyproxy didn't have
an option to log to stdout, so the user has to keep a second terminal
open to `tail -f` the log.
additionally, this precluded usage with runit service supervisor,
which runs all services in foreground and creates logfiles from the
service's stdout/stderr.
since logging to stdout doesn't make sense when daemonized, now if
no logfile is specified and daemon mode activated, a warning is
printed to stderr once, and nothing is logged.
the original idea was to fail with an error message, though some users
might actually want to run tinyproxy as daemon and no logging at all.
some people want to run tinyproxy with minimal configuration from
the command line (and as non-root), but tinyproxy insists on writing
a pid file, which only makes sense for usage as a service, hereby
forcing the user to either run it as root so it can write to the
default location, or start editing the default config file to work
around it.
and if no pidfile is specified in the config, it frankly doesn't
make sense to force creation of one anyway.
this causes a build failure on several platforms using older versions
of autotools or GNU make.
make[2]: Entering directory `src'
Makefile:670: *** missing separator (did you mean TAB instead of 8 spaces?). Stop.
make[2]: Leaving directory `src'
fixes#72
This should make hash processing generally faster.
There is a treadeoff between memory footprint and
speed of processing. 10 KB instead of 1.2 KB of
hash table per process should not be a huge problem
even on very limited current systems.
Who really needs to stick to 32 buckets could
recompile. We could also think about making
this configurable at some point.
Signed-off-by: Michael Adam <obnox@samba.org>
This hash function distributes much better than the
original one. The effect is not as visible with
hashes taken modulo 32 than with a bigger modulus,
but it is there. And larger number of buckets migh
become possible in the future...
Reviewed-by: Michael Adam <obnox@samba.org>
I seem to have forgotten to compile with transparent support enabled...
This belongs to the fix for bug BB#63.
Signed-off-by: Michael Adam <obnox@samba.org>
This was accidentially used instead of the function parameter listen_addrs
This still belongs to the fix for bug BB#63.
Signed-off-by: Michael Adam <obnox@samba.org>
check the return code of fcntl via socket_nonblocking
on the listen sockets in child_main()
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the server fd.
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the client fd.
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Use extract_url instead of the old extract_ssl_url:
extract_url is generic and handles ipv6 literal addresses correctly.
Signed-off-by: Michael Adam <obnox@samba.org>
There is in fact nothing http-specific any more about this function, hence
the rename. The input has been stripped of the <proto>:// header anyways.
This in preparation of fixing bug BB#106: ssl fails with literal ipv6 addrs.
Signed-off-by: Michael Adam <obnox@samba.org>
When removing the '[' and ']' characers from the ipv6 literal address, make sure
the pointer that is later free'd stays a malloced pointer by memmoving the
string one place left.
Signed-off-by: Michael Adam <obnox@samba.org>
log entering opensock and successful return of getaddrinfo.
This allows to detect dns timeouts from looking at the logs.
Signed-off-by: Michael Adam <obnox@samba.org>
This is achieved by not stopping at the first result of getaddrinfo
that we managed to listen on: Without "Listen" in the config, we
call getraddrinfo with NULL address. With AI_PASSIVE, this gives results
for both IPv4 and IPv6 wildcard addresses (if both are supported).
This lets tinyproxy listen on both IPv4 and IPv6 wildcard if the system
supports them.
Signed-off-by: Michael Adam <obnox@samba.org>
This prepares listenting on multiple sockets, which will be ussed to
fix listening on the wildcard (listen on both ipv6 and ipv4) and
help add the support for multiple Listen statements in the config
Signed-off-by: Michael Adam <obnox@samba.org>
instead of using config.ipAddr internally.
This is in preparation to make it possible
to call it for multiple addresses.
Signed-off-by: Michael Adam <obnox@samba.org>
This changes listen_sock() to not return the
addrlen of the used address from getaddrinfo call
to the caller, stored in global addrlen in child.c.
This was only used to be able to allocate enough space for the
arguments to the later accept call depending on whether
IPv4 or IPv6 is used.
This removes the need to pass this info by always allocating
sizeof(struct sockaddr_storage) instead, which is enough
to carry both sockaddr_in and sockaddr_in6.
Signed-off-by: Michael Adam <obnox@samba.org>
Supplementary groups are inherited from the calling process. Drop all
supplementary groups if the "Group" configuration directive is set to
change to a different user. Otherwise the process may have more rights
than expected.
Reviewed-by: Michael Adam <obnox@samba.org>
Pass a pointer to a char pointer to do_transparent_proxy so the reassembled URL
will actually end up back in the caller where it is needed for filtering
decisions. This fixes the problem that a tinyproxy configured with the
transparent proxy functionality and "FilterURLs Yes" would filter on everything
but the domain.
Signed-off-by: daniel.egger@sphairon.com
Signed-off-by: Michael Adam <obnox@samba.org>
There are frequent questions "what does 'No proxy for ...' mean?"
on the mailing list and IRC. Be more specific. (No upstream proxy ...)
Correspondingly, log "Found upstream proxy ... for ..."