as suggested in #212, it seems the majority of people don't understand
that input was expected to be in regex format and people were using
filter lists containing plain hostnames, e.g. `www.google.com`.
apart from that, using fnmatch() for matching is actually a lot less
computationally expensive and allows to use big blacklists without
incurring a huge performance hit.
the config file now understands a new option `FilterType` which can
be one of `bre`, `ere` and `fnmatch`.
The `FilterExtended` option was deprecated in favor of it.
It still works, but will be removed in the release after the next.
* check return values of memory allocation and abort gracefully
in out-of-memory situations
* use sblist (linear dynamic array) instead of linked list
- this removes one pointer per filter rule
- removes need to manually allocate/free every single list item
(instead block allocation is used)
- simplifies code
* remove storage of (unused) input rule
- removes one char* pointer per filter rule
- removes storage of the raw bytes of each filter rule
* add line number to display on out-of-memory/invalid regex situation
* replace duplicate filter_domain()/filter_host() code with a single
function filter_run()
- reduces code size and management effort
with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.
the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gzcloses#20
The modified files were indented with GNU indent using the
following command:
indent -npro -kr -i8 -ts8 -sob -l80 -ss -cs -cp1 -bs -nlps -nprs -pcs \
-saf -sai -saw -sc -cdw -ce -nut -il0
No other changes of any sort were made.
The notices have been changed to a more GNU look. Documentation
comments have been separated from the copyright header. I've tried to
keep all copyright notices intact. Some author contact details have
been updated.
I re-indented the source code using indent with the following options:
indent -kr -bad -bap -nut -i8 -l80 -psl -sob -ss -ncs
There are now _no_ tabs in the source files, and all indentation is
eight spaces. Lines are 80 characters long, and the procedure type is
on it's own line. Read the indent manual for more information about
what each option means.
(filter_init): Added code to handle both host and URLs. Also include code to use extended regular expressions.
(filter_domain): The old filter_url function has been renamed filter_domain().
(filter_url): This function now actually filters complete URLs.