filter++ lib
filter++ lib is a blazingly fast library for filtering strings for swear words. nothing less, nothing more in the worst case its 100X faster than other libraries, and sometimes its 10000X faster
keep it simple stupid
filter++ lib provides a sane default blacklist
filter++ lib is just a library check out filter++ proper or lush to see it implemented in the chat
Bypass protection
filter++ lib will filter out messages that try to bypass with look alike letters. for example: assuming "wonky" is a swear word, the following combinations will be correctly filtered
- wonky
- w o n k y
- woooonky
- w00nky
- WONKY
- w〇〇N𝕜𝕐
API
the complete API is in the repository's API.md
file
here is a quick rundown
-- add word wonky to the blacklist
filterpp_lib.add_to_blacklist("wonky")
-- the blacklist needs to be reinitialized
filterpp_lib.initialize_blacklist()
local is_violating, bad_words = filterpp_lib.filter_text("wonky WOOOONKY")
-- is_violating - equals true because the word "wonky" is blacklisted
-- bad_words equals to
-- {
-- {1, 5}, -- first violating word, start and end position
-- {7, 14} -- second violating word, starts and end position
-- }
-- register a callback for when players swear
filterpp_lib.register_on_violation(function(name, message, bad_words)
-- bad_words has the same format as the return value of filter_text()
minetest.send_chat_player(name, string.format("you cant swear, you swore %d times", #bad_words))
end)
-- you should report a violation when you filter text and find any violations
if is_violating then
filterpp_lib.report_violation("singleplayer", "wonky WOOOONKY", bad_words)
end
you can look at filter++ source code for example of usage
Command
filter++ lib adds the manage_filter
command
manage_filters add <word> | remove <word> | list | return_to_default
the filter++
privilege is needed to run the command
the return_to_default
option resets the blacklist to the default blacklist
Comparison with other filtering mods
sofar's filter mod
- filter++ lib dosent add unrelated commands for moderation (kicking/muting)
- filter++ lib dosent add default punishment
- filter++ lib dosent register any callbacks for
minetest.register_on_chat_messages
- filter++ lib provides a default blacklist which makes it unusable if you want to use it as a filtering library if you want to use that callback too
- filter++ lib is order of magnitudes faster
monk filterplus mod
- filter++ lib dosent add unrelated commands for moderation (kicking/muting)
- filter++ lib dosent register any callbacks for
minetest.register_on_chat_messages
which makes it unusable if you want to use it as a filtering library if you want to use that callback too - filter++ lib is order of magnitudes faster
- filter++ lib prevents bypassing with lookalike unicode characters
- filter++ lib's default blacklist only has english profanities, and is slightly bigger overall
- filter++ lib provides a complete and documented API
- filter++ lib will never give you snarky insults
Benchmarks
all benchmarks user filterplus's default blacklist
filter++ lib no UTF-8 | filter++ | filterplus | filter | |
---|---|---|---|---|
check a 170 letter sentence once | 0,01ms | 0,005ms | 1ms | 20ms |
check word "wanker" 10000 times | 0,06ms | 0,9ms | 500ms | 500ms |
check 100 character long message 10000 times | 0,2ms | 13ms | 147889ms (25m) | 96 431ms (16m) |
graphs are included in the screenshots
notes
during the longer benchmarks i was doing things in the background, but i think the overall message remains
the 170 letter sentence used is: "i just joined this server and i don't know why you knob heads are so mean to me. i just tried to relax for a day but you guys are screaming at me all the time, get a life."
the 100 character long message was just "A" repeated 100 times
all tests were made in development game to reduce the noise
since filter and filterplus don't support UTF-8, i also ran benchmarks without it for filter<ins> to show how much faster it is when it does comparable things. the UTF-8 version is slightly faster in the first benchmark because the no UTF-8 filter</ins> uses older and slower version
filter<ins> lib can achieve this time by cleverly arranging its data so checking a word takes O(word_length), unlike filter and filterplus that do it by having a huge list of lua patterns to check in O(word_length * registered_patterns). the downside is that filter</ins> lib uses more memory than filter (though not more than filterplus, because i don't load a 6MB whitelist)
Credits
- desojevic - blacklist that the default blacklist is based on
- monk - "encouragement"
- Me - i wrote the thing