The End of Comment Spam?
What is Comment Spam?
Comment spam is an evil and real problem for blogs. The premise of it goes as follows: evil, vile spammers post use old blog entries to post comments that are littered with links to their porn/gambling/diploma/pharmaceutical sites so that Google/MSN/Yahoo! spider the site they find these links/add them to their dictionaires/spider them/improve their page rank/etc./etc. Technologies that are designed to make posting easier, such as CommentAPI, just help automate the comment spam posting by these ne'er-do-wells.
Past Techniques for Stopping Comment Spam
Until recently, the the main approaches for stopping comment spam have been:
- Moderation - a post doesn't appear on a blog until the blog owner reviews and approves it. The advantage of this is that only on-topic, non-spam/non-inflamatory posts are displayed; the disadvantage is that the blog owner must now take the time to micro-manage approval of messages.
- Use of a Captcha - a captcha is a test that most humans can pass, but current computer programs cannot. We've all seen these, it's typically a sequence of wavy letters that you must type into a textbox before proceeding. The downside to captchas is, to my knowledge, the CommentAPI specification does not support them, so you can only utilize captchas on entering comments through the Web interface. (There's a Captcha control for .Text blogs, as discussed here.)
- Banning Certain Substrings from Comments - another approach, which is the one I use here on ScottOnWriting.NET, is to simply restrict certain substrings from appearing in the comment. There are varying degrees of complexity that can be applied here. I simply have a set of static strings I search for and add to them when a particularly nasty comment spammer starts causing trouble. Other solutions actually utilize a global blacklist of URLs used by comment spammers, such as http://www.jayallen.org/comment_spam/blacklist.txt.
- Munging the URLs in Comments - since comment spammers post their URLs to improve their rank in the search engines, one can remove the impetus for a spammer by removing their desired benefit. One way to accomplish this is to munge the URLs in a comment from something like http://www.somesite.com/BuyViagra.htm to redirect.aspx?http://www.somesite.com/BuyViagra.htm, or to utilize Google's redirect link (which doesn't impact PageRank): http://www.google.com/url?sa=D&q=URL, as discussed here.
- Require Authentication to Post Comments - many online forums use this technique, requiring that a user have an account before being able to post. The theory here is that if someone starts posting spam or off-topic, inflamatory posts, they can be banned and their obnoxious posts deleted. Sure, a motivated spammer can create a new account, but they have to go through the process of using a new email address, filling out an account creation form, and verifying their account by clicking on some link received in an email. The major downsides to this is (1) that CommentAPI (to my knowledge) doesn't support any sort of authentication piece, and (2) those who want to post to your blog need to create an account. Similarly, if another blogger takes the same approach, they'll need to create another account over there. And so on and so on for every blogger that required authentication.
None of these solutions are really panaceas; the true fix for comment spam is to have some centralized user store and to have blogs require folks to authenticate against this store in order to post. I blabbed on more about this idea in a past blog entry, Improving the Blog Commenting Experience.
A New Alternative to Fighting Comment Spam
Yesterday Google announced a new attribute for HREF tags that, if present, will indicate that its spiders won't follow the URL, thereby negating the benefits of comment spamming (much like URL munging removes the benefits, except this approach, IMO, is simpler). Basically, if you add rel=”nofollow” to an HREF, Google won't spider the link (i.e., <a href=”Blah.aspx” rel=”nofollow”>This won't be spidered!</a>.)
Will this measure stop comment spam? It depends, primarily, on how many search engines support this and, more importantly, how many blog engines support this. The good news is that not only Google will respect the rel=”nofollow” attribute, but so will MSN Search and Yahoo! Also, a large number of blog engines have promised to utilize this technique, including:
- MSN Spaces
- Community Server (the evolution of .Text)
Even if the vast majority of blog engines start using the rel=”nofollow” attribute comment spam may still run rampant in the hope that some blogs won't support it. Think of it this way - how much stuff have you purchased from a spammer, yet how many spams a day do you get? In the end, I think Google/MSN Search/Yahoo!'s addition of the rel=”nofollow” attribute is a very positive step in the right direction, but I think one would have to be a bit naive to think that this would spell the end of comment spam, meaning we'll still need to use one or more of the techniques I discussed previously until we finally have some global authentication/user store available that everyone agrees to use...