* check return values of memory allocation and abort gracefully
in out-of-memory situations
* use sblist (linear dynamic array) instead of linked list
- this removes one pointer per filter rule
- removes need to manually allocate/free every single list item
(instead block allocation is used)
- simplifies code
* remove storage of (unused) input rule
- removes one char* pointer per filter rule
- removes storage of the raw bytes of each filter rule
* add line number to display on out-of-memory/invalid regex situation
* replace duplicate filter_domain()/filter_host() code with a single
function filter_run()
- reduces code size and management effort
with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.
the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gzcloses#20
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.
unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.
right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
The new code skips leading whitespaces before removing trailing
whitespaces and comments.
Without doing this, lines with leading whitespace are treated like empty
lines (i.e. they are ignored).
The modified files were indented with GNU indent using the
following command:
indent -npro -kr -i8 -ts8 -sob -l80 -ss -cs -cp1 -bs -nlps -nprs -pcs \
-saf -sai -saw -sc -cdw -ce -nut -il0
No other changes of any sort were made.
The notices have been changed to a more GNU look. Documentation
comments have been separated from the copyright header. I've tried to
keep all copyright notices intact. Some author contact details have
been updated.
I re-indented the source code using indent with the following options:
indent -kr -bad -bap -nut -i8 -l80 -psl -sob -ss -ncs
There are now _no_ tabs in the source files, and all indentation is
eight spaces. Lines are 80 characters long, and the procedure type is
on it's own line. Read the indent manual for more information about
what each option means.
(filter_init): Added code to handle both host and URLs. Also include code to use extended regular expressions.
(filter_domain): The old filter_url function has been renamed filter_domain().
(filter_url): This function now actually filters complete URLs.