Commit Graph

221 Commits

Author SHA1 Message Date
rofl0r
da1bc1425d tune error messages to show select or poll depending on what is used 2020-09-17 21:03:51 +01:00
rofl0r
683a354196 remove vector remains 2020-09-16 02:39:09 +01:00
rofl0r
e929e81a55 add_header: use sblist
note that the old code inserted added headers at the beginning of the
list, reasoning unknown. this seems counter-intuitive as the headers
would end up in the request in the reverse order they were added,
but this was irrelevant, as the headers were originally first put
into the hashmap hashofheaders before sending it to the client.
since the hashmap didn't preserve ordering, the headers would appear
in random order anyway.
2020-09-16 02:39:09 +01:00
rofl0r
10cdee3bc5 prepare transition to poll()
usage of select() is inefficient (because a huge fd_set array has to
be initialized on each call) and insecure (because an fd >= FD_SETSIZE
will cause out-of-bounds accesses using the FD_*SET macros, and a system
can be set up to allow more than that number of fds using ulimit).
for the moment we prepared a poll-like wrapper that still runs select()
to test for regressions, and so we have fallback code for systems without
poll().
2020-09-15 23:12:00 +01:00
rofl0r
0c8275a90e refactor conns.[ch], put conn_s into child struct
this allows to access the conn member from the main thread handling
the childs, plus simplifies the code.
2020-09-15 23:12:00 +01:00
rofl0r
155bfbbe87 replace leftover users of hashmap with htab
also fixes a bug where the ErrorFile directive would create a
new hashmap on every added item, effectively allowing only
the use of the last specified errornumber, and producing memory
leaks on each config reload.
2020-09-15 23:12:00 +01:00
rofl0r
34a8b28414 save headers in an ordered dictionary
due to the usage of a hashmap to store headers, when relaying them
to the other side the order was not prevented.
even though correct from a standards point-of-view, this caused
issues with various programs, and it allows to fingerprint the use
of tinyproxy.

to implement this, i imported the MIT-licensed hsearch.[ch] from
https://github.com/rofl0r/htab which was originally taken from
musl libc. it's a simple and efficient hashtable implementation
with far better performance characteristic than the one previously
used by tinyproxy. additionally it has an API much more well-suited
for this purpose.

orderedmap.[ch] was implemented from scratch to address this issue.
behind the scenes it uses an sblist to store string values, and a htab
to store keys and the indices into the sblist.
this allows us to iterate linearly over the sblist and then find the
corresponding key in the hash table, so the headers can be reproduced
in the order they were received.

closes #73
2020-09-15 23:11:59 +01:00
rofl0r
c64ac9edbe fix get_request_entity()
get_request_entity()'s purpose is to drain remaining unread bytes
in the request read pipe before handing out an error page,
and kinda surprisingly, also when connection to the stathost is
done.

in the stathost case tinyproxy just skipped proper processing and
jumped to the error handler code, and remembering whether a
connection to the stathost was desired in a variable, then doing
things a bit differently depending on whether it's set.

i tried to fix issues with get_request_entity in
88153e944f (which is basically the
right fix for the issue it tried to solve, but incomplete),
and resulting from there in 78cc5b72b1.
the latter fix wasn't quite right since we're not supposed to check
whether the socket is ready for writing, and having a return value
of 2 instead of 1 got resulted in some of the if statements not
kicking in when they should have.
this also resulted in the stathost page no longer working.

after in-depth study of the issue i realized that we only need to
call get_request_entity() when the headers aren't completely read,
additional to setting the proper connection timeout as
88153e944f already implemented.
the changes of 78cc5b72b1 have been
reverted.
2020-09-13 00:37:19 +01:00
rofl0r
9e40f8311f handle_connection(): print process_*_headers errno information 2020-09-10 21:13:31 +01:00
rofl0r
f1bd259e6e handle_connection: replace "goto fail" with func call
this allows to see in a backtrace from where the error was
triggered.
2020-09-10 14:48:39 +01:00
rofl0r
e94cbdb3a5 handle_connection(): factor out failure code
this allows us in a next step to replace goto fail with a call to that
function, so we can see in a backtrace from where the failure was
triggered.
2020-09-10 14:37:56 +01:00
rofl0r
b549ba5af3 remove bogus custom timeout handling code
in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.

additionally the code added in 0b9a74c290
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
2020-09-09 12:37:23 +01:00
rofl0r
b4e3f1a896 fix negative timeout resulting in select() EINVAL 2020-09-09 11:59:40 +01:00
rofl0r
78cc5b72b1 get_request_entity: fix regression w/ CONNECT method
introduced in 88153e944f.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.

this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
2020-09-08 14:45:24 +01:00
rofl0r
88153e944f get_request_entity: respect user-set timeout
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
2020-09-07 20:49:07 +01:00
[anp/hsw]
17ae1b512c Do not give error while storing invalid header 2020-09-07 01:12:50 +01:00
rofl0r
233ce6de3b filter: reduce memory usage, fix OOM crashes
* check return values of memory allocation and abort gracefully
  in out-of-memory situations

* use sblist (linear dynamic array) instead of linked list
  - this removes one pointer per filter rule
  - removes need to manually allocate/free every single list item
    (instead block allocation is used)
  - simplifies code

* remove storage of (unused) input rule
  - removes one char* pointer per filter rule
  - removes storage of the raw bytes of each filter rule

* add line number to display on out-of-memory/invalid regex situation

* replace duplicate filter_domain()/filter_host() code with a single
  function filter_run()
  - reduces code size and management effort

with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.

the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gz

closes #20
2020-09-05 19:42:34 +01:00
rofl0r
0b9a74c290 enforce socket timeout on new sockets via setsockopt()
the timeout option set by the config file wasn't respected at all
so it could happen that connections became stale and were never released,
which eventually caused tinyproxy to hit the limit of open connections and
never accepting new ones.

addresses #274
2020-07-15 09:59:25 +01:00
rofl0r
3230ce0bc2 anonymous: fix segfault loading config item
unlike other functions called from the config parser code,
anonymous_insert() accesses the global config variable rather than
passing it as an argument. however the global variable is only set
after successful loading of the entire config.

we fix this by adding a conf argument to each anonymous_* function,
passing the global pointer in calls done from outside the config
parser.

fixes #292
2020-03-16 13:19:39 +00:00
rofl0r
c63d5d26b4 access config via a pointer, not a hardcoded struct address
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.

unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.

right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
2020-01-15 16:09:41 +00:00
rofl0r
cd005a94ce implement detection and denial of endless connection loops
it is quite easy to bring down a proxy server by forcing it to make
connections to one of its own ports, because this will result in an endless
loop spawning more and more connections, until all available fds are exhausted.
since there's a potentially infinite number of potential DNS/ip addresses
resolving to the proxy, it is impossible to detect an endless loop by simply
looking at the destination ip address and port.

what *is* possible though is to record the ip/port tuples assigned to outgoing
connections, and then compare them against new incoming connections. if they
match, the sender was the proxy itself and therefore needs to reject that
connection.

fixes #199.
2019-12-21 00:43:45 +00:00
rofl0r
f6d4da5d81 do hostname resolution only when it is absolutely necessary for ACL check
tinyproxy used to do a full hostname resolution whenever a new client
connection happened, which could cause very long delays (as reported in #198).

there's only a single place/scenario that actually requires a hostname, and
that is when an Allow/Deny rule exists for a hostname or domain, rather than
a raw IP address. since it is very likely this feature is not very widely used,
it makes absolute sense to only do the costly resolution when it is unavoidable.
2019-12-21 00:43:45 +00:00
rofl0r
734ba1d970 fix usage of stathost in combination with basic auth
http protocol requires different treatment of proxy auth vs server auth.

fixes #246
2019-06-14 01:18:19 +01:00
rofl0r
c651664720 fix socks5 upstream user/pass subnegotiation check
RFC 1929 specifies that the user/pass auth subnegotation repurposes the version
field for the version of that specification, which is 1, not 5.
however there's quite a good deal of software out there which got it wrong and
replies with version 5 to a successful authentication, so let's just accept both
forms - other socks5 client programs like curl do the same.

closes #172
2018-05-29 21:59:11 +02:00
rofl0r
b8c6a2127d implement user/password auth for socks5 upstream proxy
just like the rest of the socks code, this was stolen from
proxychains-ng, of which i'm happen to be the maintainer of,
so it's not an issue (the licenses are identical, too).
2018-02-27 20:13:07 +00:00
rofl0r
39132b9787 rename members of proxy_type enum to have a common prefix
and add a NONE member.
2018-02-25 23:52:23 +00:00
rofl0r
bf76aeeba1 implement HTTP basic auth for upstream proxies
loosely based on @valenbg1's code from PR #38

closes #38
closes #96
2018-02-25 15:13:45 +00:00
rofl0r
bd04ed00d8 Basic Auth: send correct response codes and headers acc. to rfc7235
as reported by @natedogith1
2018-02-06 16:57:02 +00:00
rofl0r
8db511b9bf add support for basic HTTP authentication
using the "BasicAuth" keyword in tinyproxy.conf.

base64 code was written by myself and taken from my own library "libulz".
for this purpose it is relicensed under the usual terms of the tinyproxy
license.
2018-02-06 16:57:02 +00:00
rofl0r
7a3fd81a8d fix types used in SOCKS4/5 support code
the line

    len = buff[0]; /* max = 255 */

could lead to a negative length if the value in buff[0] is > 127.
2018-02-06 16:11:39 +00:00
Gonzalo Tornaria
8906b0734e add SOCKS upstream proxy support (socks4/socks5)
original patch submitted in 2006 to debian mailing list:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=392848%29#12

this version was rebased to git and updated by Russ Dill <russ.dill@gmail.com>
in 2015 (the original patch used a different config file format).

as discussed in #40.

commit message by @rofl0r.
2018-02-06 16:11:39 +00:00
Stephan Leemburg
c5da1cc934 Continue with forward proxy if ReverseOnly is not true and no mapping available (#35)
allow non-reverse mappings if reverseonly is not enabled
2016-09-10 19:22:45 +02:00
Michael Adam
800c3a250c BB#110 Increase number of hash buckets from 32 to 256.
This should make hash processing generally faster.

There is a treadeoff between memory footprint and
speed of processing. 10 KB instead of 1.2 KB of
hash table per process should not be a huge problem
even on very limited current systems.

Who really needs to stick to 32 buckets could
recompile. We could also think about making
this configurable at some point.

Signed-off-by: Michael Adam <obnox@samba.org>
2014-12-13 01:41:56 +01:00
Michael Adam
545463c75d BB#110 limit the number of headers per request to prevent DoS
Based on patch provided by gpernot@praksys.org on bugzilla.

Signed-off-by: Michael Adam <obnox@samba.org>
2014-12-13 01:28:07 +01:00
Michael Adam
76bd008cf9 reqs: fix typo in a debug message in get_request_entity()
Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-23 11:59:47 +01:00
Michael Adam
3710accf72 reqs: Fix CID 1130969 (part 3) - unchecked return value from library.
Check the return value of socket_blocking (fcntl) at the
end of relay_connection() for client socket.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 21:56:39 +01:00
Michael Adam
e07c363df2 reqs: Fix CID 1130969 (part 2) - unchecked return value from library.
Check the return value of socket_blocking (fcntl) at the
end of relay_connection().

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 21:44:12 +01:00
Michael Adam
c82840bfcb reqs: Fix CID 1130972 - remove logically dead code.
url == NULL is caught above.

Found by coverity.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 18:58:19 +01:00
Michael Adam
0a99803425 reqs: Fix CID 1130967 - unchecked return value from library.
Check the return code of fcntl via socket_blocking
in pull_client_data().

Found by coverity.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 18:49:45 +01:00
Michael Adam
9efa5799f0 reqs: Fix CID 1130968 - unchecked return value from library
Check the return code of fcntl via socket_nonblocking
in pull_client_data()

Found by coverity.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 18:49:45 +01:00
Michael Adam
c27b6d15e2 reqs: rename a variable.
ret will be used in enclosing scope.
so rename this special varibale.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 18:49:45 +01:00
Michael Adam
68bd0b61b5 reqs: fix CID 1130969 - unchecked return code from library
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the server fd.

Found by coverity.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 17:35:59 +01:00
Michael Adam
2004abc1e3 reqs: fix CID 1130970 - unchecked return code from library
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the client fd.

Found by coverity.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 17:35:54 +01:00
Michael Adam
0f18e4fc3a BB#106: remove now unused extract_ssl_url.
Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-16 15:26:06 +01:00
Michael Adam
9f43cfd488 BB#106: fix CONNECT requsts with IPv6 literal addresses as host.
Use extract_url instead of the old extract_ssl_url:
extract_url is generic and handles ipv6 literal addresses correctly.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-16 15:25:44 +01:00
Michael Adam
98f77ef8c7 BB#106: add default_port argument to extract_http_url and rename it to extract_url
There is in fact nothing http-specific any more about this function, hence
the rename. The input has been stripped of the <proto>:// header anyways.

This in preparation of fixing bug BB#106: ssl fails with literal ipv6 addrs.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-16 15:09:48 +01:00
Michael Adam
69c348ce6d req: move a variable into the scope where it is used in extract_http_url()
Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-16 13:10:03 +01:00
Michael Adam
bb2e894e0d BB#116: fix invalid free when connecting to ipv6 literal address
When removing the '[' and ']' characers from the ipv6 literal address, make sure
the pointer that is later free'd stays a malloced pointer by memmoving the
string one place left.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-16 13:07:19 +01:00
Mukund Sivaraman
7378c97524 Surround IPv6 literals with [] in Host: headers 2011-02-07 18:00:39 +05:30
Mukund Sivaraman
2d02e2211e Handle IPv6 literals in URLs correctly 2011-02-04 20:28:48 +05:30