Commit Graph

977 Commits

Author SHA1 Message Date
rofl0r
4847d8cdb3 add_new_errorpage(): fix segfault accessing global config
another fallout of the config refactoring finished by
2e02dce0c3.

apparently no one using the ErrorFile directive used git master
during the last months, as there have been no reports about this issue.
2020-09-12 21:38:04 +01:00
rofl0r
df9074db6e vector.h: missing include <unistd.h> for ssize_t 2020-09-12 15:56:36 +01:00
rofl0r
9e40f8311f handle_connection(): print process_*_headers errno information 2020-09-10 21:13:31 +01:00
rofl0r
f1bd259e6e handle_connection: replace "goto fail" with func call
this allows to see in a backtrace from where the error was
triggered.
2020-09-10 14:48:39 +01:00
rofl0r
e94cbdb3a5 handle_connection(): factor out failure code
this allows us in a next step to replace goto fail with a call to that
function, so we can see in a backtrace from where the failure was
triggered.
2020-09-10 14:37:56 +01:00
rofl0r
b549ba5af3 remove bogus custom timeout handling code
in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.

additionally the code added in 0b9a74c290
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
2020-09-09 12:37:23 +01:00
rofl0r
b4e3f1a896 fix negative timeout resulting in select() EINVAL 2020-09-09 11:59:40 +01:00
rofl0r
78cc5b72b1 get_request_entity: fix regression w/ CONNECT method
introduced in 88153e944f.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.

this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
2020-09-08 14:45:24 +01:00
rofl0r
58cfaf2659 make acl lookup 450x faster by using sblist
tested with 32K acl rules, generated by

    for x in `seq 128` ; do for y in `seq 255` ; do \
    echo "Deny 10.$x.$y.0/24" ; done ; done

after loading the config (which is dogslow too), tinyproxy
required 9.5 seconds for the acl check on every request.
after switching the list implementation to sblist, a request
with the full acl check now takes only 0.025 seconds.
the time spent for loading the config file is identical for both
list implementations, roughly 30 seconds.

(in a previous test, 65K acl rules were generated, but every
connection required almost 2 minutes to crunch through the list...)
2020-09-07 22:09:35 +01:00
rofl0r
ebc7f15ec7 acl: typedef access_list to acl_list_t
this allows to switch the underlying implementation easily.
2020-09-07 21:53:14 +01:00
rofl0r
efa5892011 check_acl: do full_inet_pton() only once per ip
if there's a long list of acl's, doing full_inet_pton() over
and over with the same IP isn't really efficient.
2020-09-07 20:57:16 +01:00
rofl0r
88153e944f get_request_entity: respect user-set timeout
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
2020-09-07 20:49:07 +01:00
rofl0r
487a062fcc change loglevel of start/stop/reload messages to NOTICE
this allows to see them when the verbose INFO loglevel is not desired.

closes #78
2020-09-07 16:59:37 +01:00
rofl0r
23b0c84653 upstream: fix ip/mask calculation for types other than none
the code wrongly processed the site_spec (here: domain) parameter
only when PT_TYPE == PT_NONE.
re-arranged code to process it correctly whenever passed.
additionally the mask is now also applied to the passed subnet/ip,
so a site_spec like 127.0.0.1/8 is converted into 127.0.0.0/8.
also the case where inet_aton fails now produces a proper error
message.

note that the code still doesn't process ipv6 addresses and mask.
to support it, we should use the existing code in acl.c and refactor
it so it can be used from both call sites.

closes #83
closes #165
2020-09-07 16:11:51 +01:00
rofl0r
a8848d4bd8 html-error: substitute template variables via a regex
previously, in order to detect and insert {variables} into error/stats
templates, tinyproxy iterated char-by-char over the input file, and would
try to parse anything inside {} pairs and treat it like a variable name.
this breaks CSS, and additionally it's dog slow as tinyproxy wrote every
single character to the client via a write syscall.
now we process line-by-line, and inspect all matches of the regex
\{[a-z]{1,32}\}. if the contents of the regex are a known variable name,
substitution is taking place. if not, the contents are passed as-is to
the client. also the chunks before and after matches are written in
a single syscall.

closes #108
2020-09-07 04:32:13 +01:00
[anp/hsw]
17ae1b512c Do not give error while storing invalid header 2020-09-07 01:12:50 +01:00
rofl0r
d0fae11760 config parser: increase possible line length limit
let's use POSIX LINE_MAX (usually 4KB) instead of 1KB.

closes #226
2020-09-07 01:07:00 +01:00
rofl0r
8c86e8b3ae allow SIGUSR1 to be used as an alternative to SIGHUP
this allows a tinyproxy session in terminal foreground mode to reload
its configuration without dropping active connections.
2020-09-06 23:11:22 +01:00
rofl0r
95b1a8ea06 main.c: remove set_signal_handler code duplication 2020-09-06 23:08:10 +01:00
rofl0r
8ba0ac4e86 do not catch SIGHUP in foreground-mode
it's quite unexpected for an application running foreground in a
terminal to keep running when the terminal is closed.
also in such a case (if file logging is disabled) there's no way to
see what's happening to the proxy.
2020-09-06 22:46:26 +01:00
rofl0r
0d71223a1d send_html_file(): also set empty variables to "(unknown)" 2020-09-06 20:06:59 +01:00
rofl0r
36c9b93cfe transparent: remove usage of inet_ntoa(), make IPv6 ready
inet_ntoa() uses a static buffer and is therefore not threadsafe.
additionally it has been deprecated by POSIX.

by using inet_ntop() instead the code has been made ipv6 aware.

note that this codepath was only entered in the unlikely event that
no hosts header was being passed to the proxy, i.e. pre-HTTP/1.1.
2020-09-06 16:22:11 +01:00
rofl0r
233ce6de3b filter: reduce memory usage, fix OOM crashes
* check return values of memory allocation and abort gracefully
  in out-of-memory situations

* use sblist (linear dynamic array) instead of linked list
  - this removes one pointer per filter rule
  - removes need to manually allocate/free every single list item
    (instead block allocation is used)
  - simplifies code

* remove storage of (unused) input rule
  - removes one char* pointer per filter rule
  - removes storage of the raw bytes of each filter rule

* add line number to display on out-of-memory/invalid regex situation

* replace duplicate filter_domain()/filter_host() code with a single
  function filter_run()
  - reduces code size and management effort

with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.

the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gz

closes #20
2020-09-05 19:42:34 +01:00
Nicolai Søborg
281488a729 Change loglevel for "Maximum number of connections reached"
I was hit by this, and did not see anything in the log, connections was just hanging.
Think warning is a better log level
2020-09-01 15:07:03 +01:00
rofl0r
335477b16e upstream: allow port 0 to be specified
this is useful to use upstream directive to null-route a specific target
domain.

e.g.
upstream http 0.0.0.0:0 ".adserver.com"
2020-08-19 12:01:20 +01:00
rofl0r
0b9a74c290 enforce socket timeout on new sockets via setsockopt()
the timeout option set by the config file wasn't respected at all
so it could happen that connections became stale and were never released,
which eventually caused tinyproxy to hit the limit of open connections and
never accepting new ones.

addresses #274
2020-07-15 09:59:25 +01:00
xiejianjun
db4bd162a3 fix check_acl compilation with --enable-debug
regression introduced in f6d4da5d81.
this has been overlooked due to the assert macro being optimized out in
non-debug builds.
2020-07-06 11:37:35 +01:00
rofl0r
d98aabf47f transparent: fix invalid memory access
getsockname() requires addrlen to be set to the size of the sockaddr struct
passed as the addr, and a check whether the returned addrlen exceeds the
initially passed size (to determine whether the address returned is truncated).

with a request like "GET /\r\n\r\n" where length is 0 this caused the code
to assume success and use the values of the uninitialized sockaddr struct.
2020-03-18 12:31:15 +00:00
rofl0r
3230ce0bc2 anonymous: fix segfault loading config item
unlike other functions called from the config parser code,
anonymous_insert() accesses the global config variable rather than
passing it as an argument. however the global variable is only set
after successful loading of the entire config.

we fix this by adding a conf argument to each anonymous_* function,
passing the global pointer in calls done from outside the config
parser.

fixes #292
2020-03-16 13:19:39 +00:00
rofl0r
2e02dce0c3 conf: use 2 swappable conf slots, so old config can stay valid
... in case reloading of it after SIGHUP fails, the old config can
continue working.

(apart from the logging-related issue mentioned in 27d96df999 )
2020-01-15 17:03:47 +00:00
rofl0r
5dd514af93 conf: fix loading of default values
previously, default values were stored once into a static struct,
then on each reload item by item copied manually into a "new"
config struct.
this has proven to be errorprone, as additions in one of the 2
locations were not propagated to the second one, apart from
being simply a lot of gratuitous code.

we now simply load the default values directly into the config
struct to be used on each reload.

closes #283
2020-01-15 16:57:03 +00:00
rofl0r
27d96df999 remove duplicate code calling reload_config_file()
as a side effect of not updating the config pointer when loading
the config file fails, the "FIXME" level comment to take appropriate
action in that case has been removed. the only issue remaining
when receiving a SIGHUP and encountering a malformed config file would
now be the case that output to syslog/logfile won't be resumed, if
initially so configured.
2020-01-15 16:35:43 +00:00
rofl0r
c63d5d26b4 access config via a pointer, not a hardcoded struct address
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.

unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.

right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
2020-01-15 16:09:41 +00:00
rofl0r
bffa705005 remove config file name item from conf struct
since this is set via command line, we can deal with it easily
from where it is actually needed.
2020-01-15 15:42:24 +00:00
rofl0r
180c0664aa remove godaemon member from config structure
since this option can't be set via config file, it makes sense
to factor it out and use it only where strictly needed, e.g. in
startup code.
2020-01-15 15:26:40 +00:00
rofl0r
eb2104e1ff log: remove special case code for daemonized mode without logfile
if daemon mode is used and neither logfile nor syslog options specified,
this is clearly a misconfiguration issue. don't try to be smart and work
around that, so less global state information is required.
also, this case is already checked for in main.c:334.
2020-01-15 15:22:43 +00:00
rofl0r
4fb2c14039 syslog: always use LOG_USER facility
LOG_DAEMON isn't specified in POSIX and the gratuitously different
treatment is in the way of a planned cleanup.
2020-01-15 15:09:37 +00:00
rofl0r
40afaeb637 move commandline parsing to main() 2020-01-15 14:45:23 +00:00
rofl0r
25205fd1f3 move initialize_config_defaults to conf.c 2020-01-15 14:17:13 +00:00
rofl0r
cd005a94ce implement detection and denial of endless connection loops
it is quite easy to bring down a proxy server by forcing it to make
connections to one of its own ports, because this will result in an endless
loop spawning more and more connections, until all available fds are exhausted.
since there's a potentially infinite number of potential DNS/ip addresses
resolving to the proxy, it is impossible to detect an endless loop by simply
looking at the destination ip address and port.

what *is* possible though is to record the ip/port tuples assigned to outgoing
connections, and then compare them against new incoming connections. if they
match, the sender was the proxy itself and therefore needs to reject that
connection.

fixes #199.
2019-12-21 00:43:45 +00:00
rofl0r
f6d4da5d81 do hostname resolution only when it is absolutely necessary for ACL check
tinyproxy used to do a full hostname resolution whenever a new client
connection happened, which could cause very long delays (as reported in #198).

there's only a single place/scenario that actually requires a hostname, and
that is when an Allow/Deny rule exists for a hostname or domain, rather than
a raw IP address. since it is very likely this feature is not very widely used,
it makes absolute sense to only do the costly resolution when it is unavoidable.
2019-12-21 00:43:45 +00:00
rofl0r
82e10935d2 move sockaddr_union to sock.h 2019-12-21 00:43:45 +00:00
rofl0r
fa2ad0cf9a log.c: protect logging facility with a mutex
since the write syscall is used instead of stdio, accesses have been
safe already, but it's better to use a mutex anyway to prevent out-
of-order writes.
2019-12-21 00:43:45 +00:00
rofl0r
b09d8d927d conf.c: merely warn on encountering recently obsoleted config items
if we don't handle these gracefully, pretty much every existing config
file will fail with an error, which is probably not very friendly.

the obsoleted config items can be made hard errors after the next
release.
2019-12-21 00:43:45 +00:00
rofl0r
1186c297b4 conf.c: pass lineno to handler funcs 2019-12-21 00:43:45 +00:00
rofl0r
b935dc85c3 simplify codebase by using one thread/conn, instead of preforked procs
the existing codebase used an elaborate and complex approach for
its parallelism:

5 different config file options, namely

- MaxClients
- MinSpareServers
- MaxSpareServers
- StartServers
- MaxRequestsPerChild

were used to steer how (and how many) parallel processes tinyproxy
would spin up at start, how many processes at each point needed to
be idle, etc.
it seems all preforked processes would listen on the server port
and compete with each other about who would get assigned the new
incoming connections.
since some data needs to be shared across those processes, a half-
baked "shared memory" implementation was provided for this purpose.
that implementation used to use files in the filesystem, and since
it had a big FIXME comment, the author was well aware of how hackish
that approach was.

this entire complexity is now removed. the main thread enters
a loop which polls on the listening fds, then spins up a new
thread per connection, until the maximum number of connections
(MaxClients) is hit. this is the only of the 5 config options
left after this cleanup. since threads share the same address space,
the code necessary for shared memory access has been removed.
this means that the other 4 mentioned config option will now
produce a parse error, when encountered.

currently each thread uses a hardcoded default of 256KB per thread
for the thread stack size, which is quite lavish and should be
sufficient for even the worst C libraries, but people may want
to tweak this value to the bare minimum, thus we may provide a new
config option for this purpose in the future.
i suspect that on heavily optimized C libraries such a musl, a
stack size of 8-16 KB per thread could be sufficient.

since the existing list implementation in vector.c did not provide
a way to remove a single item from an existing list, i added my
own list implementation from my libulz library which offers this
functionality, rather than trying to add an ad-hoc, and perhaps
buggy implementation to the vector_t list code. the sblist
code is contained in an 80 line C file and as simple as it can get,
while offering good performance and is proven bugfree due to years
of use in other projects.
2019-12-21 00:43:45 +00:00
Martin Kutschker
69c86b987b Use gai_strerror() to report errors of getaddrinfo() and getnameinfo() 2019-11-27 20:31:48 +00:00
rofl0r
734ba1d970 fix usage of stathost in combination with basic auth
http protocol requires different treatment of proxy auth vs server auth.

fixes #246
2019-06-14 01:18:19 +01:00
Janosch Hoffmann
e666e4a35b filter file: Don't ignore lines with leading whitespace (#239)
The new code skips leading whitespaces before removing trailing
whitespaces and comments.
Without doing this, lines with leading whitespace are treated like empty
lines (i.e. they are ignored).
2019-05-05 19:13:38 +01:00
rofl0r
b131f45cbb
child.c: properly initialize fdset for each select() call (#216)
it was reported that because the fdset was only initialized once,
tinyproxy would fail to properly listen on more than one interface.

closes #214
closes #127
2018-12-15 17:09:04 +00:00