<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Creating a profanity filter</title>
	<atom:link href="http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/</link>
	<description>Blogging in a new web</description>
	<pubDate>Thu, 28 Aug 2008 00:25:28 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
		<item>
		<title>By: Michael Fienen</title>
		<link>http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-227</link>
		<dc:creator>Michael Fienen</dc:creator>
		<pubDate>Thu, 03 Jan 2008 14:14:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-227</guid>
		<description>Here you go.  Keep in mind, it's greedy, so partial matches count ("ass" will match "asshole" or "asses").  You will notice some words or patterns that one wouldn't consider profanity, but they were included to help insure a clean display.

$filter = array("fuck","shit","damn","cunt","ass","porn","gay","fag","dick","cock",
"puss","penis","vagina","butt","boob","\btit((t(y&#124;ies))&#124;s)?\b","breast","lesbian",
"dyke","tranny","transvestite","queer","sex","poop","turd","hermaphrodite",
"an(a&#124;u)(l&#124;s)","std","stupid","dumb","crabs","gonorrhea","homo","pubic","herpes",
"aids","beer","liquor","booze","hell","horn(y&#124;ier)","fart","beastility","bitch","piss",
"hardcore","erection","orgasm","blow(\s)?job","prick","cum","ejaculat","nigg",
"facial","dildo","vibrator","goddamn","\d{6,9}","\d{3}-\d{2}-\d{4}","death",
"kill","murder","rap(e&#124;ing)","bukkake","hentai","fellatio","cunnilingus",
"intercourse","erotic","pervert");</description>
		<content:encoded><![CDATA[<p>Here you go.  Keep in mind, it&#8217;s greedy, so partial matches count (&#8221;ass&#8221; will match &#8220;asshole&#8221; or &#8220;asses&#8221;).  You will notice some words or patterns that one wouldn&#8217;t consider profanity, but they were included to help insure a clean display.</p>
<p>$filter = array(&#8221;fuck&#8221;,&#8221;shit&#8221;,&#8221;damn&#8221;,&#8221;cunt&#8221;,&#8221;ass&#8221;,&#8221;porn&#8221;,&#8221;gay&#8221;,&#8221;fag&#8221;,&#8221;dick&#8221;,&#8221;cock&#8221;,<br />
&#8220;puss&#8221;,&#8221;penis&#8221;,&#8221;vagina&#8221;,&#8221;butt&#8221;,&#8221;boob&#8221;,&#8221;\btit((t(y|ies))|s)?\b&#8221;,&#8221;breast&#8221;,&#8221;lesbian&#8221;,<br />
&#8220;dyke&#8221;,&#8221;tranny&#8221;,&#8221;transvestite&#8221;,&#8221;queer&#8221;,&#8221;sex&#8221;,&#8221;poop&#8221;,&#8221;turd&#8221;,&#8221;hermaphrodite&#8221;,<br />
&#8220;an(a|u)(l|s)&#8221;,&#8221;std&#8221;,&#8221;stupid&#8221;,&#8221;dumb&#8221;,&#8221;crabs&#8221;,&#8221;gonorrhea&#8221;,&#8221;homo&#8221;,&#8221;pubic&#8221;,&#8221;herpes&#8221;,<br />
&#8220;aids&#8221;,&#8221;beer&#8221;,&#8221;liquor&#8221;,&#8221;booze&#8221;,&#8221;hell&#8221;,&#8221;horn(y|ier)&#8221;,&#8221;fart&#8221;,&#8221;beastility&#8221;,&#8221;bitch&#8221;,&#8221;piss&#8221;,<br />
&#8220;hardcore&#8221;,&#8221;erection&#8221;,&#8221;orgasm&#8221;,&#8221;blow(\s)?job&#8221;,&#8221;prick&#8221;,&#8221;cum&#8221;,&#8221;ejaculat&#8221;,&#8221;nigg&#8221;,<br />
&#8220;facial&#8221;,&#8221;dildo&#8221;,&#8221;vibrator&#8221;,&#8221;goddamn&#8221;,&#8221;\d{6,9}&#8221;,&#8221;\d{3}-\d{2}-\d{4}&#8221;,&#8221;death&#8221;,<br />
&#8220;kill&#8221;,&#8221;murder&#8221;,&#8221;rap(e|ing)&#8221;,&#8221;bukkake&#8221;,&#8221;hentai&#8221;,&#8221;fellatio&#8221;,&#8221;cunnilingus&#8221;,<br />
&#8220;intercourse&#8221;,&#8221;erotic&#8221;,&#8221;pervert&#8221;);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reuben</title>
		<link>http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-226</link>
		<dc:creator>Reuben</dc:creator>
		<pubDate>Thu, 03 Jan 2008 06:10:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-226</guid>
		<description>bah!!  i was hoping i would get to see all the naughty words you came up with in your code.  "word1", "word2", "word3"??  

feh, what a gyp!</description>
		<content:encoded><![CDATA[<p>bah!!  i was hoping i would get to see all the naughty words you came up with in your code.  &#8220;word1&#8243;, &#8220;word2&#8243;, &#8220;word3&#8243;??  </p>
<p>feh, what a gyp!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Fienen</title>
		<link>http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-204</link>
		<dc:creator>Michael Fienen</dc:creator>
		<pubDate>Tue, 18 Dec 2007 17:13:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-204</guid>
		<description>In the case of this particular script, there were no concerns about an overly greedy match being made, because enough searches are done a minute on our Mini that it is regularly refreshed anyway.  Kids can be awfully meddlesome too, and will try many tricks to get around it for a giggle, so the more aggressive, the better.

But you can always tweak the match in the array, for instance making "word2" be "\bword2\b" to match to word boundaries rather than any time it sees word2 appear somewhere in whole or part.  

You could also add a flag into the script, something like: 
  $matchGreedy = TRUE;
And based off that, include \b in the preg_match() function if it's true, or not if false.</description>
		<content:encoded><![CDATA[<p>In the case of this particular script, there were no concerns about an overly greedy match being made, because enough searches are done a minute on our Mini that it is regularly refreshed anyway.  Kids can be awfully meddlesome too, and will try many tricks to get around it for a giggle, so the more aggressive, the better.</p>
<p>But you can always tweak the match in the array, for instance making &#8220;word2&#8243; be &#8220;\bword2\b&#8221; to match to word boundaries rather than any time it sees word2 appear somewhere in whole or part.  </p>
<p>You could also add a flag into the script, something like:<br />
  $matchGreedy = TRUE;<br />
And based off that, include \b in the preg_match() function if it&#8217;s true, or not if false.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brett</title>
		<link>http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-203</link>
		<dc:creator>Brett</dc:creator>
		<pubDate>Tue, 18 Dec 2007 17:03:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.supersatellite.com/2007/12/06/creating-a-profanity-filter/#comment-203</guid>
		<description>I don't care much for greedy matching. It often has unintended circumstances. For instance, I can't create a user under my preferred username (bbendick) on the community server blogging platform if they have obscenity filters enabled. Or for an international flavor, try creating the username 'larsen'

Would it be better to proactively monitor the mini content in other ways and try to cleanse the source, rather than catching it on the display side? You have a nightly "dirty words" job (George Carlin 2.0) that notifies you of pages that say things you don't like, then you track down that content.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t care much for greedy matching. It often has unintended circumstances. For instance, I can&#8217;t create a user under my preferred username (bbendick) on the community server blogging platform if they have obscenity filters enabled. Or for an international flavor, try creating the username &#8216;larsen&#8217;</p>
<p>Would it be better to proactively monitor the mini content in other ways and try to cleanse the source, rather than catching it on the display side? You have a nightly &#8220;dirty words&#8221; job (George Carlin 2.0) that notifies you of pages that say things you don&#8217;t like, then you track down that content.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
