Scott on Writing

Musings on technical writing...

Stopping Comment Spam in .Text Using Triggers

With Google's recent rel=”nofollow” initiative designed to reduce blog comment spam, I've been reading up more on comment spam, techniques to fight it, and so on.  One great resource is the Comment Spam blog, which contains a blacklist of comment spammer URLs and a plugin for blocking comment spam for those with a Movable Type blog.

As discussed earlier, there are a number of techniques for fighting comment spam - moderation, filters, captchas, comment URL munging, and so on.  Being a typical developer, one of my main motivations in any endeavor is choosing a tactic that will require the least amount of work from yours truly.  Clearly this rules out moderation.  I have doubts that URL munging, which Google's initiative is fostering, will stop comment spam.  It will reduce its effectiveness, granted, but spammers are a stubborn and evil bunch, and I don't see them just throwing in the towel.

My preferred technique for stopping comment spam before it starts is to use filters.  If you have access to your .Text blog's SQL Server, you can setup filtering fairly simple.  My first attempt at this was to modify the blog_InsertEntry stored procedure and add hard-coded IF statements that would short circuit the INSERT into the database table if the comment text contained any offending URLs.  Basically this came down to something like:

IF CHARINDEX(bannedURL1, @Text) = 0 OR CHARINDEX(bannedURL2, @Text) = 0  ... OR CHARINDEX(bannedURLN, @Text) = 0
BEGIN
   ... run stored procedure ...
END

Basically the logic here was if the text did not contain an instance of any of the bannedURLs, then run the stored procedure statements.  Otherwise, none of the sproc's instructions were executed, resulting in nothing being commited to the database.

The problem with this approach was that the bannedURLs were hard-coded in the sproc, so if a new spammer came along, I'd have to add another OR clause to that ever-growing IF statement.  This violated my “do as little work as possible” creed.  In reading up on comment spam, I came across the blog entry Preventing .Text Blog Spam Using Triggers.  This technique did three things that I liked:

  1. It used triggers rather than mucking with the stored procedure.  The benefit of this, as I see, is that it makes the comment spam stopping piece orthogonal to the .Text application.  That is, it's a component that could be added or removed without requiring any change to the core .Text sprocs, tables, etc.
  2. Rather than using hard-coded checks, the trigger queried a table that contained offending substrings.  This way, updating the list of banned URLs could be done through a simple Web-based interface.
  3. In addition to checking for banned substrings, the trigger approach also counted up the number of links in the text body and blocked comments that had 15 or more links.  (Many times comment spammers will leave dozens of links in one post.

I made a few changes to the script provided.  The script provided had a table that would allow one to specify banned substrings.  That is, you could remove comments with the word Viagra, for example.  I decided this was more flexibility than needed, so I changed the table and logic to allow just URLs being specified.  Additionally, I had to change the INSERT statements in the trigger since I use a more antiquated version of .Text.

The one downside of this approach (or any filter approach of this manner) is that each blogger has his own list of banned URLs.  If a new comment spammer Web site comes about and this comment spammer starts spreading around his or her links, each blogger will have to waste their own time, adding this URL to the blacklist.  What we need is a distributed model, where everyone can contribute to the blacklist globally, and .Text blogs will automatically update their table of banned URLs periodically, as well as update the global table.  I don't know of a manner to automate this process, but it wouldn't be difficult to create a WinForms desktop client that would allow a user to add new entries to the blacklist and, optionally, update/download the data from the global blacklist.  Do you think there would be a demand for this?  Would enough people use this to make it worthwhile?

posted on Monday, January 24, 2005 11:22 AM

Feedback

# re: Stopping Comment Spam in .Text Using Triggers 1/24/2005 12:26 PM James Avery

I thought about creating a windows form client that would just display comments and allow you to delete them quickly, personally that would be enough for me... but when I thought about it I don't know that many people that have access to the SQL Server that their blog runs on. The majority of the people using .Text run on group sites, so the group admin would have to do the work on that. You could do something like a windows service that pulls down an update every day and updates the SQL, that wouldnt be that hard and would be less work for the admin.

Good Idea though.

-James

# re: Stopping Comment Spam in .Text Using Triggers 1/24/2005 1:02 PM Dan Hounshell

I, too, am a .Text user that has been inundated with comment space recently - about 5 per day. It's not all that difficult to remove them, but it is a pain. It would be better if they were just never allowed in! I'm going to give the trigger suggestion a try as it is much easier to do that than to add capcha or some other method.

I am with you, though, it would be nice to have some access to an updated list - a global blacklist. Even if it had to be downloaded manually - maybe as a sql script file to insert them. Or how about an RSS feed of newly added URLs - that way they would be aggregated by newsgator into Outlook for me and then I could collect them all and run them (insert them) from time to time.

# Comment Spamming Bastard Wiener Bastards !! 1/24/2005 2:39 PM Digging My Blog - Dan Hounshell

# Comment Spamming Bastard Wiener Bastards !! 1/24/2005 2:41 PM Digging My Blog - Dan Hounshell

# Battling Comment Spam 1/24/2005 9:22 PM Bryant Likes's Blog

# New Article: Preventing .Text Comment and Referral Spam using SQL Triggers, MT-blacklist and Regular Expressions 1/24/2005 11:01 PM a tech-centric blog from the left

# re: Stopping Comment Spam in .Text Using Triggers 1/25/2005 1:49 AM David Brabant

Hi Scott, I played a bit with triggers myself (starting with the code Chrissy wrote).

The result is available here:
http://www.xhovemont.be/archive/2005/01/24/489.aspx

One thing I (attempt to :-) do is to extract information from IIS logs regarding the spammers' real IPs.
I'm doing that in my free time (i.e., between 2 and 3 AM :-), so the solution hasn't been thoroughly tested. But it (seems to) work(s) for me.

# re: Stopping Comment Spam in .Text Using Triggers 1/25/2005 8:21 AM Erik Lane

I'm going to look into this trigger suggestion along with others I've been reading about. I also think the automated blacklist would be great.

# More on Feedback Spam 1/25/2005 8:27 AM Giddy Up!

# re: Stopping Comment Spam in .Text Using Triggers 1/26/2005 4:50 AM Johnny Ribacka

A global blacklist would probably be a good idea from a blog owners perspective. But it's not without problems as I see it.

Let's say we have a company called "FooBar". And then we have a person working for a rivalling company "BazZot".

Wouldn't it be too easy for that person to blacklist the address to FooBar's web pages? And what should FooBar do to get off the black list?

It seems to me that a global blacklist of this type would give too much power to a user with a malign agenda. Because I doubt many blog owners would go through the trouble of verifying what's on a list like that...

# re: Stopping Comment Spam in .Text Using Triggers 1/26/2005 7:12 AM Scott Mitchell

Johnny, you're right, and I had thought about this earlier. I figured the person(s) maintaining the blacklist would need to do their homework before adding a URL. Essentially, when one blog owner submitted a URL it would not immediately appear in the blacklist for others to update until a moderator approved it.

Another, less human-oritented perspective, would require that a spammer domain be found on, say, ten blogs before it would be added to the list. (I.e., 10+ folks would have to submit the link as a comment spam link before it could be propagated to others...)

# re: Stopping Comment Spam in .Text Using Triggers 1/26/2005 1:48 PM Devin Ganger

The easiest way to expose the blacklist data would be as another XML/RSS feed. The one unfortunate side effect of this is that savvy spammers could be checking a given server to see what it uses as a blacklist.

Hmm. A web of trust mechanism could help mitigate that somewhat. Subscribe to all the blacklist feeds you want, but don't add the entries to your active blacklist until a given entry exceeds a certain threshhold (user-configurable, somehow combining the number of feeds it was found in with the trust you have assigned that feed). If an entry exceeds a second (higher) treshhold, then it gets added to your outgoing blacklist feed -- your explicit approval of this entry.

# The Worth(lessness) of CAPTCHAs 2/2/2005 9:27 AM Scott on Writing

# Comment Spam Script Gone Awry 3/6/2005 9:59 PM Scott on Writing

# ¿Todavía Usas .Text? Mitiga el Spam de Comentarios 3/30/2005 2:23 PM Nazul's Weblog

Si aún usas .Text y recibes mucho spam en tus comentarios, lee este artículo para que implementes triggers...

# And the Point of this Comment Spam Would Be? 9/3/2005 4:01 PM Scott on Writing

# Quickly Deleting Comments in .Text 3/28/2006 12:27 PM Scott on Writing

Title:  
Name:  
Url:
Protected by Clearscreen.SharpHIPEnter the code you see:
Comments   

Add To Your Reader

My Links

Archives

Post Categories

 

I am a Microsoft MVP for ASP.NET.
I am an ASPInsider.
<May 2008>
SMTWTFS
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567

Comment Stats

DayTotal% of Total
Sunday 1866.8%
Monday 37914.0%
Tuesday 45316.7%
Wednesday 50518.6%
Thursday 53319.6%
Friday 49418.2%
Saturday 1666.1%
Total 2716100.0%

Hour1Total% of Total
12:00 AM 652.4%
1:00 AM 682.5%
2:00 AM 622.3%
3:00 AM 742.7%
4:00 AM 572.1%
5:00 AM 1043.8%
6:00 AM 1084.0%
7:00 AM 1585.8%
8:00 AM 1716.3%
9:00 AM 1475.4%
10:00 AM 1716.3%
11:00 AM 1816.7%
12:00 PM 1886.9%
1:00 PM 1696.2%
2:00 PM 1585.8%
3:00 PM 1324.9%
4:00 PM 1073.9%
5:00 PM 923.4%
6:00 PM 913.4%
7:00 PM 963.5%
8:00 PM 833.1%
9:00 PM 782.9%
10:00 PM 792.9%
11:00 PM 772.8%
Total 2716100.0%

Comments by Blog Entry Date/Time

Day Entry MadeAvg.Total
Sunday 5.54144
Monday 5.22339
Tuesday 4.32419
Wednesday 7.69638
Thursday 6.90607
Friday 5.48411
Saturday 5.27158
Total 5.852716

Hour1 Entry MadeAvg.Total
12:00 AM 5.0035
1:00 AM 1.002
5:00 AM 0.000
7:00 AM 7.0035
8:00 AM 5.35107
9:00 AM 6.32278
10:00 AM 6.47246
11:00 AM 4.41181
12:00 PM 6.83328
1:00 PM 3.00111
2:00 PM 5.41222
3:00 PM 8.67286
4:00 PM 4.0589
5:00 PM 5.92154
6:00 PM 4.52113
7:00 PM 9.67174
8:00 PM 10.50147
9:00 PM 5.05111
10:00 PM 5.4265
11:00 PM 4.5732
Total 5.852716

Learn More About Comment Stats
1 - All times GMT -8...


Blog Stats

Favorite Web Sites

My Books

My MSDN Articles