So, today I wrote the dirtiest function I have ever written in PHP. I typed more profanity at once than I think I ever have in the past. As part of a new AJAX (Asynchronous Javascript And XML) live search query displayer I wrote for our Google Mini, I had to do some filtering to make sure naughty phrases wouldn’t show up. This is a pretty straightforward script you could incorporate into different applications, like a shoutbox, or comment form.
Alternatively, you could swap in preg_replace() or eregi_replace() instead of preg_match() and censor phrases that way. In this example, I use preg_match() just to test the query, and if the filter matched, I excluded the query from display entirely. I have this set to match anything that occurs in a query, so if the word “butt” was a filter term, it would catch “butts,” “butthole,” and “buttmunch.” That saved a lot of extra typing and filtering. Yes, it increases the likelihood of a false positive, but in this case we weren’t too concerned about an overly aggressive filter.
If the filter makes a match, it returns a boolean value of true. From there, do as you will. You could build in your own handler code as well (especially if you just wanted to censor individual words).
If you have a better idea or refinement, comment below and I can tweak this appropriately.
Posting tweet...
4 comments ↓
I don’t care much for greedy matching. It often has unintended circumstances. For instance, I can’t create a user under my preferred username (bbendick) on the community server blogging platform if they have obscenity filters enabled. Or for an international flavor, try creating the username ‘larsen’
Would it be better to proactively monitor the mini content in other ways and try to cleanse the source, rather than catching it on the display side? You have a nightly “dirty words” job (George Carlin 2.0) that notifies you of pages that say things you don’t like, then you track down that content.
In the case of this particular script, there were no concerns about an overly greedy match being made, because enough searches are done a minute on our Mini that it is regularly refreshed anyway. Kids can be awfully meddlesome too, and will try many tricks to get around it for a giggle, so the more aggressive, the better.
But you can always tweak the match in the array, for instance making “word2″ be “\bword2\b” to match to word boundaries rather than any time it sees word2 appear somewhere in whole or part.
You could also add a flag into the script, something like:
$matchGreedy = TRUE;
And based off that, include \b in the preg_match() function if it’s true, or not if false.
bah!! i was hoping i would get to see all the naughty words you came up with in your code. “word1″, “word2″, “word3″??
feh, what a gyp!
Here you go. Keep in mind, it’s greedy, so partial matches count (”ass” will match “asshole” or “asses”). You will notice some words or patterns that one wouldn’t consider profanity, but they were included to help insure a clean display.
$filter = array(”fuck”,”shit”,”damn”,”cunt”,”ass”,”porn”,”gay”,”fag”,”dick”,”cock”,
“puss”,”penis”,”vagina”,”butt”,”boob”,”\btit((t(y|ies))|s)?\b”,”breast”,”lesbian”,
“dyke”,”tranny”,”transvestite”,”queer”,”sex”,”poop”,”turd”,”hermaphrodite”,
“an(a|u)(l|s)”,”std”,”stupid”,”dumb”,”crabs”,”gonorrhea”,”homo”,”pubic”,”herpes”,
“aids”,”beer”,”liquor”,”booze”,”hell”,”horn(y|ier)”,”fart”,”beastility”,”bitch”,”piss”,
“hardcore”,”erection”,”orgasm”,”blow(\s)?job”,”prick”,”cum”,”ejaculat”,”nigg”,
“facial”,”dildo”,”vibrator”,”goddamn”,”\d{6,9}”,”\d{3}-\d{2}-\d{4}”,”death”,
“kill”,”murder”,”rap(e|ing)”,”bukkake”,”hentai”,”fellatio”,”cunnilingus”,
“intercourse”,”erotic”,”pervert”);
Leave a Comment