SuperSatellite ||

Subscribe (?) Subscribe to RSS

Creating a profanity filter

Published on December 6th, 2007 in Tools: Print This Post

So, today I wrote the dirtiest function I have ever written in PHP. I typed more profanity at once than I think I ever have in the past. As part of a new AJAX (Asynchronous Javascript And XML) live search query displayer I wrote for our Google Mini, I had to do some filtering to make sure naughty phrases wouldn’t show up. This is a pretty straightforward script you could incorporate into different applications, like a shoutbox, or comment form.

Alternatively, you could swap in preg_replace() or eregi_replace() instead of preg_match() and censor phrases that way. In this example, I use preg_match() just to test the query, and if the filter matched, I excluded the query from display entirely. I have this set to match anything that occurs in a query, so if the word “butt” was a filter term, it would catch “butts,” “butthole,” and “buttmunch.” That saved a lot of extra typing and filtering. Yes, it increases the likelihood of a false positive, but in this case we weren’t too concerned about an overly aggressive filter.

If the filter makes a match, it returns a boolean value of true. From there, do as you will. You could build in your own handler code as well (especially if you just wanted to censor individual words).

If you have a better idea or refinement, comment below and I can tweak this appropriately.

  1. function profanityFilter($query) {
  2.   /* Set filter terms to exclude from display, including
  3.   word roots or partials. Can be regular expressions. */
  4.   $filter = array("word1","word2","word3");  
  5.  
  6.   for ($i = 0; $i < sizeof($filter); $i++) {
  7.     /* Look for a regex match, case insensitive */
  8.     if (preg_match("/".$filter[$i]."/i", $query)) {
  9.       /* Return a match, or put in your own handler */
  10.       return true;
  11.     }
  12.   }
  13. }
Bookmark/Share:
  • Print this article!
  • E-mail this story to a friend!
  • del.icio.us
  • StumbleUpon
  • Technorati
  • Slashdot
  • Digg
  • Reddit
  • Facebook
  • Fark
  • Google
  • Live
  • TwitThis
  • NewsVine
  • Pownce

4 comments ↓

#1 Brett on 12.18.07 at 11:03 am

I don’t care much for greedy matching. It often has unintended circumstances. For instance, I can’t create a user under my preferred username (bbendick) on the community server blogging platform if they have obscenity filters enabled. Or for an international flavor, try creating the username ‘larsen’

Would it be better to proactively monitor the mini content in other ways and try to cleanse the source, rather than catching it on the display side? You have a nightly “dirty words” job (George Carlin 2.0) that notifies you of pages that say things you don’t like, then you track down that content.

#2 Michael Fienen on 12.18.07 at 11:13 am

In the case of this particular script, there were no concerns about an overly greedy match being made, because enough searches are done a minute on our Mini that it is regularly refreshed anyway. Kids can be awfully meddlesome too, and will try many tricks to get around it for a giggle, so the more aggressive, the better.

But you can always tweak the match in the array, for instance making “word2″ be “\bword2\b” to match to word boundaries rather than any time it sees word2 appear somewhere in whole or part.

You could also add a flag into the script, something like:
$matchGreedy = TRUE;
And based off that, include \b in the preg_match() function if it’s true, or not if false.

#3 Reuben on 01.03.08 at 12:10 am

bah!! i was hoping i would get to see all the naughty words you came up with in your code. “word1″, “word2″, “word3″??

feh, what a gyp!

#4 Michael Fienen on 01.03.08 at 8:14 am

Here you go. Keep in mind, it’s greedy, so partial matches count (”ass” will match “asshole” or “asses”). You will notice some words or patterns that one wouldn’t consider profanity, but they were included to help insure a clean display.

$filter = array(”fuck”,”shit”,”damn”,”cunt”,”ass”,”porn”,”gay”,”fag”,”dick”,”cock”,
“puss”,”penis”,”vagina”,”butt”,”boob”,”\btit((t(y|ies))|s)?\b”,”breast”,”lesbian”,
“dyke”,”tranny”,”transvestite”,”queer”,”sex”,”poop”,”turd”,”hermaphrodite”,
“an(a|u)(l|s)”,”std”,”stupid”,”dumb”,”crabs”,”gonorrhea”,”homo”,”pubic”,”herpes”,
“aids”,”beer”,”liquor”,”booze”,”hell”,”horn(y|ier)”,”fart”,”beastility”,”bitch”,”piss”,
“hardcore”,”erection”,”orgasm”,”blow(\s)?job”,”prick”,”cum”,”ejaculat”,”nigg”,
“facial”,”dildo”,”vibrator”,”goddamn”,”\d{6,9}”,”\d{3}-\d{2}-\d{4}”,”death”,
“kill”,”murder”,”rap(e|ing)”,”bukkake”,”hentai”,”fellatio”,”cunnilingus”,
“intercourse”,”erotic”,”pervert”);

Leave a Comment

Click to Download This Theme

Login/Register

Sign in with OpenID
Don't have OpenID? Get one here.
(What is OpenID?)
My Vidoop More secure than passwords.

My Tweets

  • ...Back from the retreat. I have a strange knot in my wrist. It is unpleasant. 1 hr ago
  • ...Getting in some CMS bugfixes before having to suffer from a department "retreat" tomorrow. Retreat? More like punishment. Why me? 21 hrs ago
  • ...Changed channels on my wireless router. Much better signal strength now. Wish it was smart enough to find the cleanest frequency itself. 23 hrs ago
  • More updates...

Posting tweet...

Enjoying...

2001 ManiacsCabin Fever

The Sound Of Madness Limited Fan Club EditionIt's Not News, It's Fark: How Mass Media Tries to Pass Off Crap As News

My Stuff



Archives

My Zimbio Buddhist Blogs >