using an empty group () is not defined in the posix spec, and as such
"undefined behaviour", even though it happened to work with both GLIBC
and MUSL libc, as well as with oniguruma's POSIX compatibility API.
we used this idiom as a trick when refactoring the regex parsing,
in order not to change the match indices of all the handler functions,
ignorant that this is not explicitly allowed by the spec.
to make future refactoring easier, we introduce a MGROUP1 macro that's
added to each match group index, so we have only a single knob to turn
in case a similar change becomes necessary again.
the INT regex macro supported a 0x prefix (used e.g. for port numbers),
however following that, only digits were accepted, and not the full
range of hexdigits. it's unlikely this was used, so remove it.
note that the () expression is kept, so we don't have to adjust match
number indices all over the place.
git describe prefixes the sha1 commit hash with -g, which is exactly what
we're after. this change gets rid of the confusing "g" in the commit hash
and allows tag names that include "-".
it's been reported[0] that RHEL7 fails to properly set the length
parameter of the getsockname() call to the length of the required
struct sockaddr type, and always returns the length passed if it
is big enough.
the SOCKADDR_UNION_* macros originate from my microsocks[1] project,
and facilitate handling of the sockaddr mess without nasty casts.
[0]: https://github.com/tinyproxy/tinyproxy/issues/45#issuecomment-694594990
[1]: https://github.com/rofl0r/microsocks
it turned out that close()ing an fd behind the back of a thread
doesn't actually cause blocking operations to get a read/write event,
because the fd will stay valid to in-progress operations.
even though the existing IPV6 regex caught (almost?) all invalid
ipv6 addresses, it did so with a huge performance penalty.
parsing a file with 32K allow or deny statement took 30 secs in
a test setup, after this change less than 3.
the new regex is sufficient to recognize all valid ipv6 addresses,
and hands down the responsibility to detect corner cases to the
system's inet_pton() function, which is e.g. called from insert_acl(),
which now causes a warning to be printed in the log if a seemingly
valid address is in fact invalid.
the new regex has been tested with 486 testcases from
http://download.dartware.com/thirdparty/test-ipv6-regex.pl
and accepts all valid ones and rejects most of the invalid ones.
note that the IPV4 regex already did a similar thing and checked only
whether the ip looks like [0-9]+.[0-9]+.[0-9]+.[0-9]+ without pedantry.
move it to before disabling logging, so a message with the correct
timestamp is printed if logging was already enabled.
also add a message when loading finished, so one can see from the
timestamp how long it took.
note that this only works on a real config reload triggered by
SIGHUP/SIGUSR1, because on startup we don't know yet where to log to.
note that the old code inserted added headers at the beginning of the
list, reasoning unknown. this seems counter-intuitive as the headers
would end up in the request in the reverse order they were added,
but this was irrelevant, as the headers were originally first put
into the hashmap hashofheaders before sending it to the client.
since the hashmap didn't preserve ordering, the headers would appear
in random order anyway.
usage of select() is inefficient (because a huge fd_set array has to
be initialized on each call) and insecure (because an fd >= FD_SETSIZE
will cause out-of-bounds accesses using the FD_*SET macros, and a system
can be set up to allow more than that number of fds using ulimit).
for the moment we prepared a poll-like wrapper that still runs select()
to test for regressions, and so we have fallback code for systems without
poll().