github continues to deprecate actions and idioms in their CI system.
hopefully these changes will last for a while and maintaining a simple
CI task doesn't turn into a neverending story.
given the catastrophic way TALOS Intelligence "communicated" with upstream
(i.e. by probably sending a single mail to an unused email address),
it's probably best to explicitly document how to approach upstream
when a security issue is discovered.
https://talosintelligence.com/vulnerability_reports/TALOS-2023-1889
this bug was brought to my attention today by the debian tinyproxy
package maintainer. the above link states that the issue was known
since last year and that maintainers have been contacted, but if
that is even true then it probably was done via a private email
to a potentially outdated email address of one of the maintainers,
not through the channels described clearly on the tinyproxy homepage:
> Feel free to report a new bug or suggest features via github issues.
> Tinyproxy developers hang out in #tinyproxy on irc.libera.chat.
no github issue was filed, and nobody mentioned a vulnerability on
the mentioned IRC chat. if the issue had been reported on github or
IRC, the bug would have been fixed within a day.
since accept() uses the socklen parameter as in/out, after processing
an IPv4 the socklen fed to it waiting for the next client was only
the length of sockaddr_in, so if a connection from an IPv6 came in
the client sockaddr was only partially filled in.
this caused wrongly printed ipv6 addresses in log, and failure to
match them correctly against the acl.
closes#495
* tinyproxy.conf.5: document config strings that require double quotes
String config values matched by the STR regex must be enclosed in double
quotes
Edit descriptions for brevity
conf.c: move boolean arguments comment before BOOL group
addresses #491
* Revert conf.c: move boolean arguments comment before BOOL group
* Added support to configure IPv6 upstream proxy servers using bracket syntax.
* Added regular expression for IPv6 scope identifier to re for IPv6 address.
it's not possible to use a https url in a ReversePath directive, without
removing the security provided by https, and would require adding a
dependency on a TLS library like openssl and a lot of code complexity
to fetch the requested resource via https and relay it back to the client.
in case the reversepath directive kicked in, but the protocol wasn't
recognized, and support for transparent proxying built-in, the code
wrongfully tried to turn the request into a trans request, leading
to a bogus rewritten url like http://localhost:8888https://www.endpoint.com
and an error message that we're trying to connect to the machine the
proxy runs on.
now instead use the generic code that signals an invalid protocol/url
was used.
closes#419
while at it, the function doing it was renamed from the misleading
ssl name to what it actually does.
also inlined the strings that were previously defined as macros.
addressing #152
read_request_line() is exercised on the client's fd, and it fails
when the client closed the connection. therefore it's wrong
to send an error message to the client in this situation.
additionally, the error message states that the server closed
the connection.
might fix#383
as suggested in #212, it seems the majority of people don't understand
that input was expected to be in regex format and people were using
filter lists containing plain hostnames, e.g. `www.google.com`.
apart from that, using fnmatch() for matching is actually a lot less
computationally expensive and allows to use big blacklists without
incurring a huge performance hit.
the config file now understands a new option `FilterType` which can
be one of `bre`, `ere` and `fnmatch`.
The `FilterExtended` option was deprecated in favor of it.
It still works, but will be removed in the release after the next.
Currently a 404 is returned for a misconfigured or unavailable upstream
server. Since that's a server error it should be a 5xx instead; a 404
is confusing when used as a forward proxy and might even be harmful when
used as a reverse proxy.
It is debatable if another 5xx code might be better; the misconfigured
situation might better be a 500 whereas the connection issue could be
a 503 instead (as used eg. in haproxy).
this fixes OPTIONS requests sent from apache SVN client using their
native HTTP proxy support.
closes#421
tested with `svn info http://svnmir.bme.freebsd.org/ports/`
the fix in 0b9a74c290 was incomplete, as it
applied the socket timeout only to the socket received from accept(), but
not to sockets created for outgoing connections.
introduced in 979c737f9b.
when refactoring the "site-spec" parsing code i failed to realize that
the code dealing with acl allow/deny directives didn't provide the
option to specify netmasks in dotted ipv4 notation, unlike the code
in the upstream parser. since both scenarios now use the same parsing,
both dotted notation and CIDR slash-notation are possible.
while at it, removed the len parameter from fill_netmask_array() which
provided the illusion the array length could be of variable size.
fixes#394
as it's unproductive to be getting the same bug report for old tinyproxy versions over and over, and people not even stating which version they're using, this new issue template makes people
aware of what information to include when filing an issue request.
using a socks4 tor upstream with an .onion url resulted in
gethostbyname() returning NULL and a subsequent segfault.
not only did the code not check the return value of gethostbyname(),
that resolver API itself isn't threadsafe.
as pure SOCKS4 supports only IPv4 addresses, and the main SOCKS4
user to this date is tor, we just use SOCKS4a unconditionally and
pass the hostname to the proxy without trying to do any local name
resolving.
i suspect in 2021 almost all SOCKS4 proxy servers in existence use
SOCKS4a extension, but should i be wrong on this, i prefer issue
reports to show up and implement plain SOCKS4 fallback only when
i see it is actually used in practice.
for some reason, getting this macro is really hard across platforms,
requiring either different feature test macros or even the right order
of included headers, and its usage caused several build failures in the
past. fix it once and for all by just using 1024 as max line length if
the macro can't be retrieved.
closes#382
the flag was added in 753010f571 without
explanation, and according to my research it is used to make the linker
report undefined symbols when linking a shared library.
since we don't build any shared libs, this isn't needed at all, but
reportedly causes issues with cygwin (#382).
since there are numerous changes in HTTP/1.1, the proxyserver will
stick to using HTTP/1.0 for internal usage, however when a connection
is requested with HTTP/1.x from now on we will duplicate the minor revision
the client requested, because apparently some servers refuse to accept
HTTP/1.0
addresses #152.
the acl.c code parsing a site-spec has been factored out into a
new TU: hostspec. it was superior to the parsing code in
upstream.c in that it properly deals with both ipv4 and ipv6.
both upstream and acl now use the new code for parsing, and upstream
also for checking for a match.
acl.c still uses the old matching code as it has a lot of special case
code for specifications containing a hostname, and in case such
a spec is encountered, tries to do reverse name lookup to see if
a numeric ip matches that spec.
removing that code could break existing usecases, however since
that was never implemented for upstream nobody will miss it there.
if for example:
ReversePath = "/foo/"
and user requests "http://tinyproxy/foo" the common behaviour for HTTP
servers is to send a http 301 redirect to the correct url.
we now do the same.
as pointed out by @craigbarnes [0], using the latest fix for
the tombstone issue, it's possible to provoke a situation
that causes an endless loop when all free slots in the table
are filled up with tombstones and htab_find() is called.
therefore we need to account for those as well when deciding
if there's a need to call resize() so there's never more than
75% of the table used by either dead or live items.
the resize() serves as a rehash which gets rid of all deleted
entries, and it might cause the table size to shrink if
htab_insert() is called after a lot of items have been removed.
[0]: https://github.com/rofl0r/htab/issues/1#issuecomment-800094442
testcase:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "hsearch.h"
#define HTAB_OOM_TEST
#include "hsearch.c"
static char *xstrdup(const char *str)
{
char *dup = strdup(str);
assert(dup);
return dup;
}
void utoa(unsigned number, char* buffer) {
int lentest, len = 0, i, start = 0;
lentest = number;
do {
len++;
lentest /= 10;
} while(lentest);
buffer[start+len] = 0;
do {
i = number % 10;
buffer[start+len - 1] = '0' + i;
number -= i;
len -= 1;
number /= 10;
} while (number);
}
#define TESTSIZE 8
#define KEEP 1
static char* notorious[TESTSIZE];
static void prep() {
srand(0);
char buf[16];
size_t filled = 0;
while(filled < TESTSIZE) {
utoa(rand(), buf);
size_t idx = keyhash(buf) & (TESTSIZE-1);
if(!notorious[idx]) {
notorious[idx] = xstrdup(buf);
++filled;
}
}
}
int main(void)
{
struct htab *h = htab_create(TESTSIZE);
size_t i;
assert(h);
prep();
for(i=0; i<TESTSIZE; ++i) {
char *key = notorious[i];
printf("[%zu] = \"%s\"\n", i, key);
int r = htab_insert(h, key, HTV_N(42));
if(!r == 1) {
printf("element %zu couldn't be inserted\n", i);
break;
}
assert(r == 1);
// Ensure newly inserted entry can be found
assert(htab_find(h, key));
if(i >= KEEP) htab_delete(h, key);
}
htab_find(h, "looooop");
return 0;
}
we already required an extra argument inside the headers sent
for 401 and 407 error responses, move those to sent_http_error_message()
and refactor send_http_headers() to always take the extra argument.
in calling sites where the extra arg isn't needed, use "".
we can't just set an item's key to zero and be done with a deletion,
because this will break the item search chain.
a deleted item requires a special marker, also known as tombstone.
when searching for an item, all slots with a tombstone need to treated
as if they were in use, but when inserting an item such a slot needs
to be filled with the new item.
a common procedure is to rehash the table when the number of deleted
items crosses a certain threshold, though for simplicity we leave this
task to the resize() function which does the same thing anyway when
the hashtable grows.
this allows to fix the issue quite elegantly and with almost no
additional overhead, so we don't penalize applications that do very
few deletions.
Try all the addresses specified with Bind in order. This is necessary
e.g. for maintaining IPv4+6 connectivity while still being restricted to
one interface.
the INT regex macro supported a 0x prefix (used e.g. for port numbers),
however following that, only digits were accepted, and not the full
range of hexdigits. it's unlikely this was used, so remove it.
note that the () expression is kept, so we don't have to adjust match
number indices all over the place.
git describe prefixes the sha1 commit hash with -g, which is exactly what
we're after. this change gets rid of the confusing "g" in the commit hash
and allows tag names that include "-".
it's been reported[0] that RHEL7 fails to properly set the length
parameter of the getsockname() call to the length of the required
struct sockaddr type, and always returns the length passed if it
is big enough.
the SOCKADDR_UNION_* macros originate from my microsocks[1] project,
and facilitate handling of the sockaddr mess without nasty casts.
[0]: https://github.com/tinyproxy/tinyproxy/issues/45#issuecomment-694594990
[1]: https://github.com/rofl0r/microsocks
it turned out that close()ing an fd behind the back of a thread
doesn't actually cause blocking operations to get a read/write event,
because the fd will stay valid to in-progress operations.
even though the existing IPV6 regex caught (almost?) all invalid
ipv6 addresses, it did so with a huge performance penalty.
parsing a file with 32K allow or deny statement took 30 secs in
a test setup, after this change less than 3.
the new regex is sufficient to recognize all valid ipv6 addresses,
and hands down the responsibility to detect corner cases to the
system's inet_pton() function, which is e.g. called from insert_acl(),
which now causes a warning to be printed in the log if a seemingly
valid address is in fact invalid.
the new regex has been tested with 486 testcases from
http://download.dartware.com/thirdparty/test-ipv6-regex.pl
and accepts all valid ones and rejects most of the invalid ones.
note that the IPV4 regex already did a similar thing and checked only
whether the ip looks like [0-9]+.[0-9]+.[0-9]+.[0-9]+ without pedantry.
move it to before disabling logging, so a message with the correct
timestamp is printed if logging was already enabled.
also add a message when loading finished, so one can see from the
timestamp how long it took.
note that this only works on a real config reload triggered by
SIGHUP/SIGUSR1, because on startup we don't know yet where to log to.
note that the old code inserted added headers at the beginning of the
list, reasoning unknown. this seems counter-intuitive as the headers
would end up in the request in the reverse order they were added,
but this was irrelevant, as the headers were originally first put
into the hashmap hashofheaders before sending it to the client.
since the hashmap didn't preserve ordering, the headers would appear
in random order anyway.
usage of select() is inefficient (because a huge fd_set array has to
be initialized on each call) and insecure (because an fd >= FD_SETSIZE
will cause out-of-bounds accesses using the FD_*SET macros, and a system
can be set up to allow more than that number of fds using ulimit).
for the moment we prepared a poll-like wrapper that still runs select()
to test for regressions, and so we have fallback code for systems without
poll().
also fixes a bug where the ErrorFile directive would create a
new hashmap on every added item, effectively allowing only
the use of the last specified errornumber, and producing memory
leaks on each config reload.
due to the usage of a hashmap to store headers, when relaying them
to the other side the order was not prevented.
even though correct from a standards point-of-view, this caused
issues with various programs, and it allows to fingerprint the use
of tinyproxy.
to implement this, i imported the MIT-licensed hsearch.[ch] from
https://github.com/rofl0r/htab which was originally taken from
musl libc. it's a simple and efficient hashtable implementation
with far better performance characteristic than the one previously
used by tinyproxy. additionally it has an API much more well-suited
for this purpose.
orderedmap.[ch] was implemented from scratch to address this issue.
behind the scenes it uses an sblist to store string values, and a htab
to store keys and the indices into the sblist.
this allows us to iterate linearly over the sblist and then find the
corresponding key in the hash table, so the headers can be reproduced
in the order they were received.
closes#73
- we need to free the config after it has been succesfully loaded,
not unconditionally before reloading.
- we also need to free them before exiting from the main program
to have clean valgrind output.
1) force a config reload after some initial tests.
this will allow to identify memleaks using the valgrind test,
as this will free all structures allocated for the config, and
recreate them.
2) test ErrorFile directive by adding several of them.
this should help catch regressions such as the one fixed in
4847d8cdb3.
it will also test memleaks in the related code paths.
3) test some scenarios that should produce errors and use the
configured ErrorFile directives.
get_request_entity()'s purpose is to drain remaining unread bytes
in the request read pipe before handing out an error page,
and kinda surprisingly, also when connection to the stathost is
done.
in the stathost case tinyproxy just skipped proper processing and
jumped to the error handler code, and remembering whether a
connection to the stathost was desired in a variable, then doing
things a bit differently depending on whether it's set.
i tried to fix issues with get_request_entity in
88153e944f (which is basically the
right fix for the issue it tried to solve, but incomplete),
and resulting from there in 78cc5b72b1.
the latter fix wasn't quite right since we're not supposed to check
whether the socket is ready for writing, and having a return value
of 2 instead of 1 got resulted in some of the if statements not
kicking in when they should have.
this also resulted in the stathost page no longer working.
after in-depth study of the issue i realized that we only need to
call get_request_entity() when the headers aren't completely read,
additional to setting the proper connection timeout as
88153e944f already implemented.
the changes of 78cc5b72b1 have been
reverted.
another fallout of the config refactoring finished by
2e02dce0c3.
apparently no one using the ErrorFile directive used git master
during the last months, as there have been no reports about this issue.
in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.
additionally the code added in 0b9a74c290
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
introduced in 88153e944f.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.
this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
tested with 32K acl rules, generated by
for x in `seq 128` ; do for y in `seq 255` ; do \
echo "Deny 10.$x.$y.0/24" ; done ; done
after loading the config (which is dogslow too), tinyproxy
required 9.5 seconds for the acl check on every request.
after switching the list implementation to sblist, a request
with the full acl check now takes only 0.025 seconds.
the time spent for loading the config file is identical for both
list implementations, roughly 30 seconds.
(in a previous test, 65K acl rules were generated, but every
connection required almost 2 minutes to crunch through the list...)
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
the code wrongly processed the site_spec (here: domain) parameter
only when PT_TYPE == PT_NONE.
re-arranged code to process it correctly whenever passed.
additionally the mask is now also applied to the passed subnet/ip,
so a site_spec like 127.0.0.1/8 is converted into 127.0.0.0/8.
also the case where inet_aton fails now produces a proper error
message.
note that the code still doesn't process ipv6 addresses and mask.
to support it, we should use the existing code in acl.c and refactor
it so it can be used from both call sites.
closes#83closes#165
previously, in order to detect and insert {variables} into error/stats
templates, tinyproxy iterated char-by-char over the input file, and would
try to parse anything inside {} pairs and treat it like a variable name.
this breaks CSS, and additionally it's dog slow as tinyproxy wrote every
single character to the client via a write syscall.
now we process line-by-line, and inspect all matches of the regex
\{[a-z]{1,32}\}. if the contents of the regex are a known variable name,
substitution is taking place. if not, the contents are passed as-is to
the client. also the chunks before and after matches are written in
a single syscall.
closes#108
it's quite unexpected for an application running foreground in a
terminal to keep running when the terminal is closed.
also in such a case (if file logging is disabled) there's no way to
see what's happening to the proxy.
fixes an error in travis, which makes a shallow clone of 50 commits.
if the last tag is older than 50 commits, we get:
"fatal: No names found, cannot describe anything."
this caused a premature exit due to an assert error in safe_write()
on this line: assert (count > 0);
because the version variable in tinyproxy was empty.
inet_ntoa() uses a static buffer and is therefore not threadsafe.
additionally it has been deprecated by POSIX.
by using inet_ntop() instead the code has been made ipv6 aware.
note that this codepath was only entered in the unlikely event that
no hosts header was being passed to the proxy, i.e. pre-HTTP/1.1.
* check return values of memory allocation and abort gracefully
in out-of-memory situations
* use sblist (linear dynamic array) instead of linked list
- this removes one pointer per filter rule
- removes need to manually allocate/free every single list item
(instead block allocation is used)
- simplifies code
* remove storage of (unused) input rule
- removes one char* pointer per filter rule
- removes storage of the raw bytes of each filter rule
* add line number to display on out-of-memory/invalid regex situation
* replace duplicate filter_domain()/filter_host() code with a single
function filter_run()
- reduces code size and management effort
with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.
the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gzcloses#20
the file docs/filter-howto.txt was removed, as it contained misleading
information since it was first checked in.
it suggests the syntax for filter rules is fnmatch()-like, when in
fact they need to be specified as posix regular expressions.
additionally it contained a lot of utterly unrelated and irrelevant/
outdated text.
a few examples with the correct syntax have now been added to
tinyproxy.conf.5 manpage.
closes#212
it turned out that the upstream section in tinyproxy.conf.5 wasn't rendered
properly, because in asciidoc items following a list item are always explicitly
appended to the last list item.
after several hours of finding a workaround, it was decided to change the
manpage generator to pod2man instead.
as pod2man ships together with any perl base install, it should be available
on almost every UNIX system, unlike asciidoc which requires installation
of a huge set of dependencies (more than 1.3 GB on Ubuntu 16.04), and the
replacement asciidoctor requires a ruby installation plus a "gem" (which is
by far better than asciidoc, but still more effort than using the already
available pod2man).
tinyproxy's hard requirement of a2x (asciidoctor) for building from source
caused rivers of tears (and dozens of support emails/issues) in the past, but
finally we get rid of it. a tool such as a2x with its XML based bloat-
technology isn't really suited to go along with a supposedly lightweight
C program.
if it ever turns out that even pod2man is too heavy a dependency, we could
still write our own replacement in less than 50 lines of awk, as the pod
syntax is very low level and easy to parse.
using --disable-manpage-support it's finally possibly to disable
the formerly obligatory use of a2x to generate the manpage
documentation.
this is the final solution to the decade old problem that users need
to install the enormous asciidoc package to compile TINYproxy from
source, or otherwise get a build error, even though the vast majority
is only interested in the program itself.
solution was inspired by PR #179.
closes#179closes#111
note that since 1.10.0 release the generated release tarball includes
the generated manpages too; in which case neither the use of a2x
nor --disable-manpage-support is required.
xsltproc was once[1] used to generate AUTHORS from xml input, but
fortunately this is no longer the case.
[1]: in a time when everybody thought XML would be a Good Idea (TM)
distcheck chokes on man5/8 files still in the file tree, while the input
files (.txt) are not. these are generated by the configure script and
it would require quite some effort to get this test working.
as it is non-essential, we simply disable it.
according to https://www.gnu.org/prep/standards/html_node/Standard-Targets.html#Standard-Targets
`maintainer-clean` is the proper make target for files that are distributed
in a release tarball:
> The ‘maintainer-clean’ target is intended to be used by a maintainer of the
> package, not by ordinary users.
> You may need special tools to reconstruct some of the files that
> ‘make maintainer-clean’ deletes.
this prevents users without a2x or asciidoctor from losing their ability to
recompile tinyproxy after `make clean`, but it also means that users wanting
to regenerate the documentation need to run `make maintainer-clean`.
otherwise object files will not be rebuilt with the new configure options.
this will prevent cases like db4bd162a3
where it turned out there was a build error with --enable-debug since several
git revisions.
asciidoctor is a modern replacement for asciidoc and much more lightweight,
issuing "apt-get install asciidoc" on ubuntu 16.04 results in an attempt to
install more than 1.3 GB of dependencies.
the timeout option set by the config file wasn't respected at all
so it could happen that connections became stale and were never released,
which eventually caused tinyproxy to hit the limit of open connections and
never accepting new ones.
addresses #274
getsockname() requires addrlen to be set to the size of the sockaddr struct
passed as the addr, and a check whether the returned addrlen exceeds the
initially passed size (to determine whether the address returned is truncated).
with a request like "GET /\r\n\r\n" where length is 0 this caused the code
to assume success and use the values of the uninitialized sockaddr struct.
unlike other functions called from the config parser code,
anonymous_insert() accesses the global config variable rather than
passing it as an argument. however the global variable is only set
after successful loading of the entire config.
we fix this by adding a conf argument to each anonymous_* function,
passing the global pointer in calls done from outside the config
parser.
fixes#292
previously, default values were stored once into a static struct,
then on each reload item by item copied manually into a "new"
config struct.
this has proven to be errorprone, as additions in one of the 2
locations were not propagated to the second one, apart from
being simply a lot of gratuitous code.
we now simply load the default values directly into the config
struct to be used on each reload.
closes#283
as a side effect of not updating the config pointer when loading
the config file fails, the "FIXME" level comment to take appropriate
action in that case has been removed. the only issue remaining
when receiving a SIGHUP and encountering a malformed config file would
now be the case that output to syslog/logfile won't be resumed, if
initially so configured.
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.
unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.
right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
if daemon mode is used and neither logfile nor syslog options specified,
this is clearly a misconfiguration issue. don't try to be smart and work
around that, so less global state information is required.
also, this case is already checked for in main.c:334.
it is quite easy to bring down a proxy server by forcing it to make
connections to one of its own ports, because this will result in an endless
loop spawning more and more connections, until all available fds are exhausted.
since there's a potentially infinite number of potential DNS/ip addresses
resolving to the proxy, it is impossible to detect an endless loop by simply
looking at the destination ip address and port.
what *is* possible though is to record the ip/port tuples assigned to outgoing
connections, and then compare them against new incoming connections. if they
match, the sender was the proxy itself and therefore needs to reject that
connection.
fixes#199.
tinyproxy used to do a full hostname resolution whenever a new client
connection happened, which could cause very long delays (as reported in #198).
there's only a single place/scenario that actually requires a hostname, and
that is when an Allow/Deny rule exists for a hostname or domain, rather than
a raw IP address. since it is very likely this feature is not very widely used,
it makes absolute sense to only do the costly resolution when it is unavoidable.
since the write syscall is used instead of stdio, accesses have been
safe already, but it's better to use a mutex anyway to prevent out-
of-order writes.
if we don't handle these gracefully, pretty much every existing config
file will fail with an error, which is probably not very friendly.
the obsoleted config items can be made hard errors after the next
release.
the existing codebase used an elaborate and complex approach for
its parallelism:
5 different config file options, namely
- MaxClients
- MinSpareServers
- MaxSpareServers
- StartServers
- MaxRequestsPerChild
were used to steer how (and how many) parallel processes tinyproxy
would spin up at start, how many processes at each point needed to
be idle, etc.
it seems all preforked processes would listen on the server port
and compete with each other about who would get assigned the new
incoming connections.
since some data needs to be shared across those processes, a half-
baked "shared memory" implementation was provided for this purpose.
that implementation used to use files in the filesystem, and since
it had a big FIXME comment, the author was well aware of how hackish
that approach was.
this entire complexity is now removed. the main thread enters
a loop which polls on the listening fds, then spins up a new
thread per connection, until the maximum number of connections
(MaxClients) is hit. this is the only of the 5 config options
left after this cleanup. since threads share the same address space,
the code necessary for shared memory access has been removed.
this means that the other 4 mentioned config option will now
produce a parse error, when encountered.
currently each thread uses a hardcoded default of 256KB per thread
for the thread stack size, which is quite lavish and should be
sufficient for even the worst C libraries, but people may want
to tweak this value to the bare minimum, thus we may provide a new
config option for this purpose in the future.
i suspect that on heavily optimized C libraries such a musl, a
stack size of 8-16 KB per thread could be sufficient.
since the existing list implementation in vector.c did not provide
a way to remove a single item from an existing list, i added my
own list implementation from my libulz library which offers this
functionality, rather than trying to add an ad-hoc, and perhaps
buggy implementation to the vector_t list code. the sblist
code is contained in an 80 line C file and as simple as it can get,
while offering good performance and is proven bugfree due to years
of use in other projects.
The new code skips leading whitespaces before removing trailing
whitespaces and comments.
Without doing this, lines with leading whitespace are treated like empty
lines (i.e. they are ignored).
it was reported that because the fdset was only initialized once,
tinyproxy would fail to properly listen on more than one interface.
closes#214closes#127
If this is a git checkout, and git is available, then git describe is
used. Otherwise, the new checked in VERSION file is taken for the version.
This mechanism uses a version.sh script inspired by
http://git.musl-libc.org/cgit/musl/tree/tools/version.sh
Signed-off-by: Michael Adam <obnox@samba.org>
RFC 1929 specifies that the user/pass auth subnegotation repurposes the version
field for the version of that specification, which is 1, not 5.
however there's quite a good deal of software out there which got it wrong and
replies with version 5 to a successful authentication, so let's just accept both
forms - other socks5 client programs like curl do the same.
closes#172
sbin/ is meant for programs only usable by root, but in tinyproxy's
case, regular users can and *should* use tinyproxy; meaning it is
preferable from a security PoV to use tinyproxy as regular user.
closes#15 for real.
the previous patch that was merged[0] was halfbaked and only removed
the warning part of the original patch from openwrt[1], but didn't
actually activate bind support. further it invoked UB by removing
the return value from the function, if transparent proxy support was
compiled in.
[0]: d97d486d53
[1]: 7c01da4a72
by having all features turned on by default, the binary is only
slightly bigger, but users of binary distros get the whole package
and don't need to compile tinyproxy by hand if they need a feature
that wasn't compiled in.
it also prevents the confusion from getting syntax errors when a
config file using those features is parsed.
another advantage is that by enabling them these features may
actually get some more testing.
just like the rest of the socks code, this was stolen from
proxychains-ng, of which i'm happen to be the maintainer of,
so it's not an issue (the licenses are identical, too).
tinyproxy uses a curious mechanism to log those early messages
that result from parsing the config file before the logging mechanism
has been properly set up yet by finishing parsing of the config file:
those early messages are written into a memory buffer and then
are printed later on. this slipped my attention when making it possible
to log to stdout in ccbbb81a.
using the "BasicAuth" keyword in tinyproxy.conf.
base64 code was written by myself and taken from my own library "libulz".
for this purpose it is relicensed under the usual terms of the tinyproxy
license.
original patch submitted in 2006 to debian mailing list:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=392848%29#12
this version was rebased to git and updated by Russ Dill <russ.dill@gmail.com>
in 2015 (the original patch used a different config file format).
as discussed in #40.
commit message by @rofl0r.
if using one of unsigned or signed char for the function prototype, one
gets nasty warnings when using it with the other type. the only proper
solution is to put void* into the prototype, and then specialize the pointer
inside the function using an automatic variable.
for exactly this reason, libc functions like read(), write(), etc use void*
too.
some users want to run tinyproxy on an as-needed basis in a terminal,
without setting it up permanently to run as a daemon/service.
in such use case, it is very annoying that tinyproxy didn't have
an option to log to stdout, so the user has to keep a second terminal
open to `tail -f` the log.
additionally, this precluded usage with runit service supervisor,
which runs all services in foreground and creates logfiles from the
service's stdout/stderr.
since logging to stdout doesn't make sense when daemonized, now if
no logfile is specified and daemon mode activated, a warning is
printed to stderr once, and nothing is logged.
the original idea was to fail with an error message, though some users
might actually want to run tinyproxy as daemon and no logging at all.
some people want to run tinyproxy with minimal configuration from
the command line (and as non-root), but tinyproxy insists on writing
a pid file, which only makes sense for usage as a service, hereby
forcing the user to either run it as root so it can write to the
default location, or start editing the default config file to work
around it.
and if no pidfile is specified in the config, it frankly doesn't
make sense to force creation of one anyway.
tinyproxy conservatively defaulted to allow CONNECT method only
on two ports used by SSL in the ancient past, but since HTTPS usage
got much more widespread (actually, it's now the default for the
majority of websites), it makes sense now to allow it without
restriction by default to accomodate for the new situation.
this causes a build failure on several platforms using older versions
of autotools or GNU make.
make[2]: Entering directory `src'
Makefile:670: *** missing separator (did you mean TAB instead of 8 spaces?). Stop.
make[2]: Leaving directory `src'
fixes#72
in the unlikely case that the user's C library has broken regex support,
she should probably update to a bugfree version.
in its full consequence, checking if individual functions works basically
require to test every single function in use, which is nonsensical.
since this check required to compile and run a code sample on the host,
it cannot be checked in cross-compile scenarios and as it defaulted to yes
(broken), causes build failure in any such scenario.
closes#1
Signed-off-by: John Spencer <maillist-tinyproxy@barfooze.de>
Reviewed-by: Michael Adam <obnox@samba.org>
This should make hash processing generally faster.
There is a treadeoff between memory footprint and
speed of processing. 10 KB instead of 1.2 KB of
hash table per process should not be a huge problem
even on very limited current systems.
Who really needs to stick to 32 buckets could
recompile. We could also think about making
this configurable at some point.
Signed-off-by: Michael Adam <obnox@samba.org>
This hash function distributes much better than the
original one. The effect is not as visible with
hashes taken modulo 32 than with a bigger modulus,
but it is there. And larger number of buckets migh
become possible in the future...
Reviewed-by: Michael Adam <obnox@samba.org>
I seem to have forgotten to compile with transparent support enabled...
This belongs to the fix for bug BB#63.
Signed-off-by: Michael Adam <obnox@samba.org>
This was accidentially used instead of the function parameter listen_addrs
This still belongs to the fix for bug BB#63.
Signed-off-by: Michael Adam <obnox@samba.org>
check the return code of fcntl via socket_nonblocking
on the listen sockets in child_main()
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the server fd.
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Effectively, the return code of fcntl was not checked
by not checking the return code of socket_nonblocking()
for the client fd.
Found by coverity.
Signed-off-by: Michael Adam <obnox@samba.org>
Use extract_url instead of the old extract_ssl_url:
extract_url is generic and handles ipv6 literal addresses correctly.
Signed-off-by: Michael Adam <obnox@samba.org>
There is in fact nothing http-specific any more about this function, hence
the rename. The input has been stripped of the <proto>:// header anyways.
This in preparation of fixing bug BB#106: ssl fails with literal ipv6 addrs.
Signed-off-by: Michael Adam <obnox@samba.org>
When removing the '[' and ']' characers from the ipv6 literal address, make sure
the pointer that is later free'd stays a malloced pointer by memmoving the
string one place left.
Signed-off-by: Michael Adam <obnox@samba.org>
log entering opensock and successful return of getaddrinfo.
This allows to detect dns timeouts from looking at the logs.
Signed-off-by: Michael Adam <obnox@samba.org>
This is achieved by not stopping at the first result of getaddrinfo
that we managed to listen on: Without "Listen" in the config, we
call getraddrinfo with NULL address. With AI_PASSIVE, this gives results
for both IPv4 and IPv6 wildcard addresses (if both are supported).
This lets tinyproxy listen on both IPv4 and IPv6 wildcard if the system
supports them.
Signed-off-by: Michael Adam <obnox@samba.org>
This prepares listenting on multiple sockets, which will be ussed to
fix listening on the wildcard (listen on both ipv6 and ipv4) and
help add the support for multiple Listen statements in the config
Signed-off-by: Michael Adam <obnox@samba.org>
instead of using config.ipAddr internally.
This is in preparation to make it possible
to call it for multiple addresses.
Signed-off-by: Michael Adam <obnox@samba.org>
This changes listen_sock() to not return the
addrlen of the used address from getaddrinfo call
to the caller, stored in global addrlen in child.c.
This was only used to be able to allocate enough space for the
arguments to the later accept call depending on whether
IPv4 or IPv6 is used.
This removes the need to pass this info by always allocating
sizeof(struct sockaddr_storage) instead, which is enough
to carry both sockaddr_in and sockaddr_in6.
Signed-off-by: Michael Adam <obnox@samba.org>
Supplementary groups are inherited from the calling process. Drop all
supplementary groups if the "Group" configuration directive is set to
change to a different user. Otherwise the process may have more rights
than expected.
Reviewed-by: Michael Adam <obnox@samba.org>
Pass a pointer to a char pointer to do_transparent_proxy so the reassembled URL
will actually end up back in the caller where it is needed for filtering
decisions. This fixes the problem that a tinyproxy configured with the
transparent proxy functionality and "FilterURLs Yes" would filter on everything
but the domain.
Signed-off-by: daniel.egger@sphairon.com
Signed-off-by: Michael Adam <obnox@samba.org>
There are frequent questions "what does 'No proxy for ...' mean?"
on the mailing list and IRC. Be more specific. (No upstream proxy ...)
Correspondingly, log "Found upstream proxy ... for ..."
I.e., add a tinyproxy subdirectory.
This is meant to ease running tinyproxy as non-root user.
The subdirectory can be used to give the tinyproxy user
write permission.
Michael
i.e. add a tinyproxy subdirectory.
This is meant to ease running tinyproxy as non-root user
the subdirectory can be used to give the tinyproxy user
write permission.
Michael
This is the second part of fixing bug #74.
I lets tinyproxy create its log and pid files as the
user as which it is running, so that later on at SIGHUP,
the log file can successfully be reopened.
Michael
Muks: please verify - thes are current the fixed bugs
with milestone 1.8.0 (i.e. the renamed 1.7.2).
I hope this is correct!
I also hope this was the intended scheme - have bug
lists per version section. Please feel free to edit...
Michael
Now that there is always a log file set, we just check for
syslog being set to TRUE and in that case use syslog logging,
file logging otherwiese.
Michael
Now that we exit early when !logging_initialized, this
can actually not happen anymore anyways: When logging is
initialized, it was also properly configured.
Michael
https://www.banu.com/bugzilla/show_bug.cgi?id=55
This is achieved by streamlining handle_connection, adding
a common cleanup-and-exit poing ("done") and a common
failure exit point ("fail") that reads any pending data
from the client fd first before trying to send back
data (error page or stats page).
The new function get_request_entity that is used here,
does not honour any content-length header. It just calls
select on the client-fd and gets any data that is there
to read.
Michael
The configure would fail when cross compiling due to the regex check
automatically failing for cross compilation. Since you can't run the
regex binary check, assuming the regex library on the target platform is
working would be the only way to get the build working, or adding a
force for people to control based on their build environment.
Signed-off-by: Michael Adam <obnox@samba.org>
This way the logging from the various child processes does not
get clobbered up. Formerly, the different write portions
(time stamp, message, newline) would get mixed from the
various child processes' log messages.
Michael
Logging is re-initialized by reload_config() now.
And truncation is wrong anyways: A syslog mechanism will
move the current log file and the reopen-action will just
create a new empty log file upon SIGHUP.
Michael
This includes reopening the log file (in append mode).
Also switching from syslog to logfile and visa versa are included
when called from the SIGHUP handler.
Michael
When this code is hit, availability of syslog has already
been checked (when reading the config file). So config.syslog == TRUE
only when HAVE_SYSLOG_H is defined.
So I remove the preprocessor checks which only clobber the logic
and make the code harder to read (IMHO).
Michael
This seemed out of place. Now the information is
stored in the correct places (as log.c:logging_initialized).
This way, we will be able to cleanly re-initialize
logging during config reload (SIGHUP) in subsequent
commits.
Michael
This controls whether log_messages should write to the
log file / syslog or rather to the log_message_storage.
This will make the global processed_config_file variable
from main unneccessary in the next step.
Michael
This allows for later reloading the config at SIGHUP (e.g.).
First the old config data is freed, then the defaults that
are given as a parameter are copied over in a rather clumsy
manual fashion (maybe something more clever can be done here)
and finally, the actual config file is loaded.
Michael
asciidoc is necessary as the version number is added during
configure into the asciidoc manpage sources. So simply bundling
a pre-generated manpage won't do.
Maybe, it would be better to have a two stage process here:
1. Have AC_SUBST from configure substitute as many variables
as possible in a fist stage
tinyproxy.conf.tmpl.in --> tinyproxy.conf.tmp
2. Have make substitute those remaining paths that can not be
substituted reasonable by configure due to the internal
workings of automake.
Michael
This extends the pattern by an alternative where there are no double colons.
This is for instance the case for and IPv6 address of the form
1111:222:33:4:55:666:7777:888
Michael
The "address" member of struct acl_s is a union of a char *
and the numeric ip. So freeing the string after appending it to the
vector list is bad in two respects:
1. If the acl type was numeric, then this could (and would)
lead to a segfault due to the numeric IP data interpreted
as pointer to the string to be freed.
2. If the acl type was string, then the acl inserted into the
list contained a reference to this address string that
was freed. So in the worst case dereferencing this freed
string could segfault, or at least this could lead to
unexpectedly failing acl checks.
Michael
The patterns should not end with the end of line marker,
since they might be part of a continuing pattern,
say of the form ipv6address/mask (used for allow/deny)
Michael
This is a first cut at providing a tinyproxy.conf file with
more useful default or example directories. It uses datadir,
sysconfdir and localstatedir.
Because automake is a little special here, this template can
not simply be processed by configure (AC_CONFIG_FILES(...)),
as these variables can only be used like this in makefiles.
Instead, we need a little sed-processor in the Makfile in etc/.
Michael
This is what it actually is.
The string value was used in earlier versions to compare
against the uri->authority string. But not as a list of
sites to create an X-Tinyproxy header for, as the tinyproxy.conf
template states...
Michael
Automake 1.11 (and I think 1.10b already) offers the AM_SILENT_RULES macro.
This adds switches --quiet, --enable-silent-rules and --disable-silent-rules
to configure.
--quiet makes the configure run itself quite.
--enable-silent-rules makes the compile process less verbose:
for a file that is compiled without errors or warnings, a simple
"CC main.o" is printed (e.g.). Compiler warnings and errors
are printed of course.
This makes it much easier (IMHO) to spot build problems.
--disable-silent-rules turns the silent rules off
I have set it up such that the default for tinyproxy is to build
in verbose mode (i.e. with silent rules disabled). This prints
the whole compile call command line for each source file compiled,
precisely as before.
You can also control verbose/non-verbose mode at "make" time, i.e.
after configure has run, by calling "make V=0 ..." or "make V=1 ..."
for running in silent and verbose mode, respectively.
If the version automake used to create configure is too old,
the result is unaltered, compared to the result before this change.
Wow - this is a long commit message for a 1-liner.
But since I discussed this with Mukund earlier, and he did
not seem to be too fond if this, I felt the need to justify
this change... :-)
Michael
Somehow, I don't quite get the asciidoc formatting yet.
I can't get the extra paragraph on templates to look nice
in the manpage output.
But at least there is some content...
Michael
A ususal case here is that the headers were buggy, e.g. a line
without a ":" to separate the header field name from the value.
Previous behaviour was to silently return a blank page.
Michael
This reverts commit f3312c22a0.
The "hashofheaders" argument to process_request() is needed
for building with reverse support or with transparent support.
Michael
The compiler inlines static functions as necessary anyway.
No more inline keywords exist in Tinyproxy source code. We want to
avoid using this keyword anyway.
If we require information about the runtime environment, it can be
found using the uname program. And binutils can tell about what the
tinyproxy binary contains. Tinyproxy doesn't have to report this
information.
This feature will only confuse us during support, if users come to
us with a Tinyproxy build which has a differently named default config
file. This feature is not that useful anyway.
Enable spcifying HTTP protocol version on command line ( --http-version).
Enable specifying method (GET, CONNECT, ...) on the command line (--method).
Add POD documentation.
Use pod2usage() to print help message.
Michael
The modified files were indented with GNU indent using the
following command:
indent -npro -kr -i8 -ts8 -sob -l80 -ss -cs -cp1 -bs -nlps -nprs -pcs \
-saf -sai -saw -sc -cdw -ce -nut -il0
No other changes of any sort were made.
This runs valgrind with the -q switch - i.e. the log file
tests/env/var/log/valgrind.log will only contain anything when there were
valgrind errors. (Memory leaks...)
Michael
It provisions a test envirnonment, fires up the perl web server
and tinyproxy and currently makes one direct request to the
web server and one request through tinyproxy.
This will be modularized and extended in the sequel.
Michael
This should be one of the test tools for writing our testsuite.
This can be used to make direct connects to web servers like so:
webclient.pl server_ip:port /path/file.html
and to make requestis via a proxy like this:
webclient.pl proxy_ip:port http://webserver:port/path/file.html
Michael
This should be the web server to test against in the upcoming selftest suite.
This web server will evolve as the test suite grows.
Currently, it just returns a web site quoting the request and a fortune
(if fortune is installed...) for whatever request it gets.
The option to provide a document root is already present.
Michael
Tinyproxy is distributed under the GNU GPLv2 or above license.
If autotools are used to build Tinyproxy without this file, they
copy the system installed COPYING file which is nowadays the
GNU GPLv3 license.
The following errors occurred when running ./autogen.sh :
$ ./autogen.sh
+ aclocal
configure.ac:18: warning: AC_COMPILE_IFELSE was called before AC_USE_SYSTEM_EXTENSIONS
../../lib/autoconf/specific.m4:386: AC_USE_SYSTEM_EXTENSIONS is expanded from...
../../lib/autoconf/specific.m4:332: AC_GNU_SOURCE is expanded from...
configure.ac:18: the top level
configure.ac:18: warning: AC_RUN_IFELSE was called before AC_USE_SYSTEM_EXTENSIONS
configure.ac:19: warning: AC_COMPILE_IFELSE was called before AC_USE_SYSTEM_EXTENSIONS
../../lib/autoconf/specific.m4:459: AC_MINIX is expanded from...
configure.ac:19: the top level
configure.ac:19: warning: AC_RUN_IFELSE was called before AC_USE_SYSTEM_EXTENSIONS
and so on for autoheader and friends.
According to the autotools docs, the proper way to handle this
is to just call AC_USE_SYSTEM_EXTENSIONS.
Michael
This change allows numeric uid/gids to be specified in the User and
Group directives in tinyproxy.conf. Formerly, only username and group
names were accepted. This fixes bug #15, which was created after
looking at a case on the OpenWrt wiki.
X-Banu-Bugzilla-Ids: 15
This fixes a regression (bug #16) introduced in
95c1f39f60, where a NULL check was
removed. This caused NULL error variable values to be sent to
add_error_variable() in which strlen() segfaulted.
With this fix, custom stats pages should be displayed properly.
X-Banu-Bugzilla-Ids: 16
Moved the strtol() call into fill_netmask_array() and added additional
error checking to ensure that the strtol() call succeeded.
Error checking code taken from strtol() manpage.
Signed-off-by: Robert James Kaes <rjk@wormbytes.ca>
When building a numeric ACL with netmask, range check the supplied
value. In addition, the code to walk the array has been extracted and
"simplified".
Signed-off-by: Robert James Kaes <rjk@wormbytes.ca>
This change primarily avoids a gcc warning where timebuf
is never non-NULL. There is no need to check the value to be
inserted as it's checked inside hashmap_insert().
This changeset also lets error return values from hashmap_insert()
propogate instead of clamping them to -1 (not that these are
currently used anyway).
The notices have been changed to a more GNU look. Documentation
comments have been separated from the copyright header. I've tried to
keep all copyright notices intact. Some author contact details have
been updated.
Included the basic grammar and handler functions for the "upstream" and
"no upstream" directives. I still need to update the grammar to match
_all_ the possibilities documented in the tinyproxy.conf file, but at
least it now does as much as the old config parser.
Moved the reverse proxy code from reqs.c into it's own files
(reverse_proxy.c). The code in reqs.c is way too complicated, so I
want to move unrelated code into their own files to simplify the main
concepts in reqs.c.
I re-indented the source code using indent with the following options:
indent -kr -bad -bap -nut -i8 -l80 -psl -sob -ss -ncs
There are now _no_ tabs in the source files, and all indentation is
eight spaces. Lines are 80 characters long, and the procedure type is
on it's own line. Read the indent manual for more information about
what each option means.
Changed the variable type for the namelen variable to the correct
socklen_t type. The configure script already checked for it, but for
some reason I never got around to actually using it in this function.
tinyproxy does not prompt for any proxy information from the client, it
should not be eating the proxy headers. They are most likely needed by
an upstream proxy.
Changed the internal implementation of the hashmap to maintain the
insert order if the same key is repeated. The insertion is still
constant since we keep track of the head and tail of the bucket
chain.
TP_ARG_ENABLE macro. Except for the transparent proxy option, all the
other options remain identical. To enable transparent proxy support
use only --enable-transparent, rather than the old
--enable-transparent-proxy.
directory, so inform autoconf of this (the AC_CONFIG_AUX_DIR and
AC_CONFIG_MACRO_DIR macros.)
Also added a bunch of portability tests discovered by autoscan.
connptr->server_fd variable and moved it into an assert since we
should never be called with invalid data. Also made the function an
inline function since it's only called in one place.
to handle IPv6 style addresses along with the existing IPv4 and string
addresses. In addition, the hand-rolled "list" code has been replaced
with a vector (code reuse.) Also, the code should be a little easier
to understand (relatively speaking.)
I do need to add some kind of testing framework (in general) to check
that the new code does work with all the formats that will be thrown
at it.
This allows tinyproxy to respond to a request bound to the same
interface that the request came in on. As Oswald explains:
"attached is a patch that adds the BindSame option. it causes
binding an outgoing connection to the ip address of the respective
incoming connection. that way one can simulate an entire proxy farm
with a single instance of tinyproxy on a multi-homed machine."
Cool.
properly. (The sizeof "struct stat" was being used rather than the
proper "struct stat_s". On my system, "struct stat" is 88 bytes long,
while "struct stat_s" is 20 bytes long. Quite a difference!)
- get_ip_string() converts a binary network address into either a
dotted-decimal IPv4 address, or a IPv6 hex-string
- full_inet_pton() converts a numeric character string into an IPv6
network address (binary form). It's like the system inet_pton()
function, but it will work with bot IPv4 and IPv6 character
strings.
These functions are required for the conversion to Internet protocol
independence. (Or to put it more clearly: allow tinyproxy to work in
an IPv6 network.)
string and return the port. I cleaned up and added error handling to
the code, but it's basically "alex"'s fix.
(extract_http_url): Rewrote this function to remove all the sscanf()
calls. It's much easier to just split on the path slash (if it's
present) and then strip the user name/password and port from the host
string. Less code, handles more cases!
this addition follow:
The patch implements a simple reverse proxy (with one funky extra
feature). It has all the regular features: mapping remote servers to local
namespace (ReversePath), disabling forward proxying (ReverseOnly) and HTTP
redirect rewriting (ReverseBaseURL).
The funky feature is this: You map Google to /google/ and the Google front
page opens up fine. Type in stuff and click "Google Search" and you'll get
an error from tinyproxy. Reason for this is that Google's form submits to
"/search" which unfortunately bypasses our /google/ mapping (if they'd
submit to "search" without the slash it would have worked ok). Turn on
ReverseMagic and it starts working....
ReverseMagic "hijacks" one cookie which it sends to the client browser.
This cookie contains the current reverse proxy path mapping (in the above
case /google/) so that even if the site uses absolute links the reverse
proxy still knows where to map the request.
And yes, it works. No, I've never seen this done before - I couldn't find
_any_ working OSS reverse proxies, and the commercial ones I've seen try
to parse the page and fix all links (in the above case changing "/search"
to "/google/search"). The problem with modifying the html is that it might
not be parsable (very common) or it might be encoded so that the proxy
can't read it (mod_gzip or likes).
Hope you like that patch. One caveat - I haven't coded with C in like
three years so my code might be a bit messy.... There shouldn't be any
security problems thou, but you never know. I did all the stuff out of my
memory without reading any RFC's, but I tested everything with Moz, Konq,
IE6, Links and Lynx and they all worked fine.
manage the HTML error pages. It simplifies the source, and also make
the object file smaller. Nice. Also added any casting from (void*)
to ensure that the code compiles using a C++ compiler.
cleanly using a C++ compiler.
Changed the servers_waiting variable to an unsigned int, since the
number of servers waiting can never be negative, and added an assert()
to ensure this invariant.
realloc() can take a NULL pointer, as defined by the realloc() man
page.
Fixed the cast in both safefree() macros to compile cleaning using a
C++ compiler.
"ViaProxyName" directive. The "Via" HTTP header is _required_ by the
HTTP spec, so the code has been changed to always send the header.
However, including the proxy's host name could be considered a
security threat, so the "ViaProxyName" directive is used to set the
token sent in the "Via" header. If the directive is not enabled the
proxy's host name will be used.
standard HTTP port (80 or 443) append the port string to the host
header; otherwise, leave the host string with only the host's domain
name.
Replaced all occurrences of constant 80 and 443 with defines HTTP_PORT
and HTTP_PORT_SSL.
is used by the transparent proxy code. [Anatole Shaw]
(process_request): Fixed up the transparent proxy code so that
filtering can be done on the whole URL. [Anatole Shaw]
(pull_client_data): Added a bug fix for Internet Explorer (IE). IE
will leave an extra CR and LF after the data in an HTTP POST. The new
code will eat the extra bytes if they're present. Thanks to Yannick
Koehler for finding the bug and offering an explanation as to why it
was happening.
Changed all calls of connptr->remote_content_length to
connptr->content_length.server
replaced it with a smaller structure containing both the remote/server
and the local/client content-length fields if they're present in the
HTTP response headers.
username/password part from the host URI.
(extract_http_url), (extract_ssl_url): Use the new
strip_username_password function to remove any non-host information
from the URI.
since it's skipped by the caller before the URL is passed to this
function.
(process_request): Include code to handle proxy FTP requests as
well. This also lead to a bit of a cleanup in the calling conventions
of extract_http_url function. tinyproxy can handle both types of
resources by skipping the leading :// part.
error, a negative "error code" is returned to the program. The
various callers then need to decide what to do.
(pidfile_create): Returns an error status depending on whether the PID
file was created successfully.
tinyproxy. There is really no need for this code, since there are
perfectly good programs out there (like rinetd) which are designed for
TCP tunnelling. tinyproxy should be a good HTTP proxy, nothing more,
and nothing less; therefore, the tunnelling code is gone.
change was required to actually display all the logs in the correct
order. Also, all log lines are stored internally while tinyproxy is
starting. At the appropriate point all the logs are written to the
log file.
(pidfile_create): Changed all the error logging to write to standard error and then exit the program. This will prevent segmentation fault problems from occurring because the log file could not be created properly.
(main): Removed the log file creation code because it has been moved into the log.c file. Also, removed the explicit fclose() for the log file since it will be close when the program has exited.
(get_all_headers): Instead of dropping duplicate headers when the "double CGI" situation occurs, tinyproxy will now drop _all_ the headers from the "inner" HTTP response.
(filter_init): Added code to handle both host and URLs. Also include code to use extended regular expressions.
(filter_domain): The old filter_url function has been renamed filter_domain().
(filter_url): This function now actually filters complete URLs.
(main): Moved the signals around so that the appropriate signal is assigned to either the children or just the parrent process.
Updated the copyright on the file.
Therefore the thread.c file has been removed and this file replaces it.
These files are really just the thread.c and thread.h files with all the
threading stuff replaced with fork() code. Most of the code is identical.
allocated. Also, thanks to Justin Guyett for finding a problem the
hashmap_remove() function. There was a problem where an entry's "prev"
pointer could be pointing to freed memory.
Finally, renamed all "maps" to bucket to make the source more
understandable.
They return 0 if OK, and a positive error code.
Cleaned up the status setting code in thread_main().
Thanks to Hans-Georg Bork for fixing the problem in thread_pool_create()
where the status wasn't set early enough to allow all the threads to be
created.
Added additional logging information to let the admin know what is
happening with the thread creation.
This required a bunch of changes to the source (like the inclusion of the
end_iterator member variable.) All this was required by sites like Yahoo
which send out multiple "Set-Cookie" headers. tinyproxy needs to handle
this situation correctly.
function. The signal handler now simply sets a flag which is monitored
inside the thread_main_loop() function. The log rotation code has also
been tightened to handle any error conditions better. Credit to Petr
Lampa for suggesting that system functions inside of a signal handler is
bad magic.
from within the handle_connection() function.
Added tests to make sure pthread_create() succeeds.
Added defined tests for pthread_cancel() since it's not available on all
platforms.
and added the get_content_length() function.
The process_server_headers() function was rewritten to remove the
Connection header correctly, and also retrieve the Content-Length value.
This value is needed in the relay_connection() function since there are
some remote machines which do not properly close down the connection once
the body has been retrieved. Thanks to James Flemer for finding a test
case for this problem.
variable in conjunction with _DEPENDENCIES and _LDADD. The change here
makes filter a "required" module in the sense that it will always be
compiled (to make sure it doesn't get out of date), but it will
conditionally included in the object file.
test for the OpenBSD system which prevents the inclusion of the malloc.h
header (the functions are actually defined in stdlib.h)
I might even remove the malloc.h header altogether since the malloc/free
functions _should_ be in stdlib.h
itself to find out all the changes. Changed the process_client_header()
function to use the hashmap and vector modules. I've made this change to
better handle the Connection header. The Connection header, it it's
present, lists all the headers which should _not_ be transmitted any
further along. An HTTP/1.1 proxy must respect this.
Other changes are basically cosmetic.
16 KB.)
Added the TUNNEL_CONFIGURED() macro to help with testing for the tunnel
support code.
Create the write_message() function to encapsulate the code which sends
the information to the file descriptor.
Moved the tunnel code into it's own function.
Ignore any blank lines when tinyproxy is expecting a request line.
Instead of sending the request line to the remote server in pieces,
tinyproxy nows sends it in once go. This was done to fix a problem with
some sites like www.heise.de.
Changed all calls to connptr->ssl to connptr->connect_method.
Changed all calls to connptr->send_message to
connptr->send_response_message.
Moved the call to Via header code to inside to the tests to handle if
tinyproxy is sending an error message (don't need to send any headers.)
section and the anonymous header section. Once I had removed the DNS
caching, the ternary tree system was overkill for the anonymous header
code. Replaced in the anonymous header section with a simple linked list.
of the host names being resolved, which is not recommended by RFC2616.
Basically, if a HTTP client doesn't respect the TTL is should not be
caching the address since it leaves itself open to DNS spoofing attacks.
Also, having a DNS caching system is an administater decision, and so
should not be included in the tinyproxy source.
<p>Tinyproxy is a light-weight HTTP/HTTPS proxy daemon for POSIX operating systems. Designed from the ground up to be fast and yet small, it is an ideal solution for use cases such as embedded deployments where a full featured HTTP proxy is required, but the system resources for a larger proxy are unavailable.</p>
<p>Tinyproxy is distributed using the GNU GPL license (version 2 or above).</p>
<p>Tinyproxy has a <strong>small footprint</strong> and requires very little in the way of system resources. The memory footprint tends to be around 2 MB with glibc, and the CPU load increases linearly with the number of simultaneous connections (depending on the speed of the connection). Thus, Tinyproxy can be run on an older machine, or on a network appliance such as a Linux-based broadband router, without any noticeable impact on performance.</p>
<p>Tinyproxy requires only a <strong>minimal POSIX environment</strong> to build and operate. It can use additional libraries to add functionality though.</p>
<p>Tinyproxy allows forwarding of <strong>HTTPS connections</strong> without modifying traffic in any way through the <code>CONNECT</code> method (see the <code>ConnectPort</code> directive, which you should disable, unless you want to restrict the users).</p>
<p>Tinyproxy supports being configured as a <strong>transparent proxy</strong>, so that a proxy can be used without requiring any client-side configuration. You can also use it as a <strong>reverse proxy</strong> front-end to your websites.</p>
<p>Using the <code>AddHeader</code> directive, you can <strong>add/insert HTTP headers</strong> to outgoing traffic (HTTP only).</p>
<p>If you're looking to build a custom web proxy, Tinyproxy is <strong>easy to modify</strong> to your custom needs. The source is straightforward, adhering to the KISS principle. As such, it can be used as a foundation for anything you may need a web proxy to do.</p>
<p>Tinyproxy has <strong>privacy features</strong> which can let you configure which HTTP headers should be allowed through, and which should be blocked. This allows you to restrict both what data comes to your web browser from the HTTP server (e.g., cookies), and to restrict what data is allowed through from your web browser to the HTTP server (e.g., version information). Note that these features do not affect HTTPS connections.</p>
<p>Using the <strong>remote monitoring</strong> facility, you can access proxy statistics from afar, letting you know exactly how busy the proxy is.</p>
<p>You can configure Tinyproxy to <strong>control access</strong> by only allowing requests from a certain subnet, or from a certain interface, thus ensuring that random, unauthorized people will not be using your proxy.</p>
<p>With a bit of configuration (specifically, making Tinyproxy created files owned by a non-root user and running it on a port greater than 1024), Tinyproxy can be made to run without any special privileges, thus minimizing the chance of system compromise. In fact, it is <b>recommended</b> to run it as a regular/restricted user. Furthermore, it was designed with an eye towards preventing buffer overflows. The simplicity of the code ensures it remains easy to spot such bugs.</p>
<p>Note that many distributions ship horribly outdated versions of tinyproxy, therefore it is recommended to compile it from source.</p>
<ul>
<li>On Red Hat Enterprise Linux, or its derivatives such as CentOS, install Tinyproxy from the EPEL repository by running yum install tinyproxy.</li>
<li>On Fedora, install Tinyproxy by running yum install tinyproxy.</li>
<li>On Debian and derived distributions, run apt-get install tinyproxy to install Tinyproxy.</li>
<li>For openSUSE run: zypper in tinyproxy</li>
<li>Arch users can install the Tinyproxy package from the community repository. Run pacman -S tinyproxy to install it.</li>
<li>FreeBSD, OpenBSD or NetBSD users can use the pkg_add utility to install the tinyproxy package.</li>
<li>Mac OS X users can check MacPorts to see if the Tinyproxy port there is recent enough.</li>
</ul>
<p>If you feel that the Tinyproxy binary package in your operating system is not recent (likely), please contact the package maintainer for that particular operating system. If this fails, you can always compile the latest stable, or even better, the latest git master version, from source code.</p>
<p>We distribute Tinyproxy in source code form, and it has to be compiled in order to be used on your system. Please see the INSTALL file in the source code tree for build instructions. The current stable version of Tinyproxy is available on the <a href="https://github.com/tinyproxy/tinyproxy/releases">releases page</a>. The Tinyproxy NEWS file contains the release notes. You can verify the tarball using its PGP signature. You can also browse the older releases of Tinyproxy.</p>
<p>We use Git as the version control system for the Tinyproxy source code repository. To get a copy of the Tinyproxy repository, use the command:</p>
<p>The quickest way to get started is using a minimal config file like the below:</p>
<pre><code>
Port 8888
Listen 127.0.0.1
Timeout 600
Allow 127.0.0.1
</code></pre>
<p>And then simply run <code>tinyproxy -d -c tinyproxy.conf</code> as your current user. This starts tinyproxy in foreground mode with <code>tinyproxy.conf</code> as its config, while logging to stdout. Now, all programs supporting a HTTP proxy can use 127.0.0.1:8888 as a proxy. You can try it out using <code>http_proxy=127.0.0.1:8888 curl example.com</code>.</p>
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.