By: Scott Mitchell
| [My Blog]
Written: October 12th, 2003
Last Updated: February 12th, 2004
I am no longer hesitant to give out my email address on the Internet - it's mitchell@4GuysFromRolla.com. Of course, I've never been too hesitant to publish my email address, it's on literally thousands of Web pages on the three Web sites I run (4GuysFromRolla.com, ASPFAQs.com, and ASPMessageboard.com), which helps explain why, prior to October 6th, 2003, I was receiving over 100 spam emails per day on one email address.
Spam has been a major problem for me for the past several years. With each passing year the number of spams I received has more than doubled. Assuming this continued exponential growth, I estimated by 2010 I would be receiving over 61,000 pieces of spam in my Inbox per day. That's over 42 pieces of spam per second. Of course, these estimations are more for a grin than to be taken seriously, but the fact remains: prior to October 6th I was inundated by a daily torrent of spam.
"What happened October 6th," you ask? Did I shut down Outlook for good? Nope, I employed what seems to me the only plausible way to end spam but still receive important email: I built a challenge/response (C/R) spam blocking system. A C/R spam blocking system works by allowing emails from a list of "trusted" email addresses (a white list), and refusing emails from a list of "black list" emails. When a new email arrives, the email's From (and possibly To) address is checked to see if it belongs in the white list or black list. Email messages from white listed addresses are downloaded by my email reader, while black listed emails are automatically deleted. When a message arrives from a sender who is in neither the white nor black list, the person is sent a challenge email, with directions on how to respond. The response process is simple, namely that they visit a Web page and enter a password. Once this step is completed, the person is added to the white list. Until this step is performed, their email is in limbo.
The whole idea behind a C/R spam blocking system is that the spammer will not take the time to respond to the challenge email, while people who are interested in contacting me will respond so that they can be added to the white list. This response is a one-time affair, and only takes a moment, so (in theory), anyone who is interested in contacting me won't mind the brief step they need to perform prior to emailing me. There are currently a couple of commercial companies that offer spam control via C/R. The one I have heard most talked about is SpamArrest, which charges a reasonable monthly fee for their service.
This article, I think you will agree, is a bit lengthy. It is divided up into three sections. In the first part, I examine the C/R spam blocking system I built, offerring advice and lessons learned to others who may be interested in implementing such a system. In the second section I evaluate the success of my C/R spam blocking system. Finally, in the third part I discuss both the negatives and positives of C/R spam blocking systems.
All C/R spam blocking systems work in virtually the same manner, by introducing an intermediary between the user's actual, spam-infested Inbox, and a quarantined Inbox. For the duration of this article I'll refer to the potentially spam-infested Inbox as the "infested account" and the quarantined Inbox as the "quarantined account." The following illustration depicts the job of the intermediary:
The intermediary, as the above illustration hopefully depicts, acts like a filter through which spam is strained out. Ideally, all non-spam emails will pass through the filter while all spam emails will not.
The job of the intermediary is five-fold:
I named my intermediary Spam Blocker. The first order of business was determining a data model
for storing information on the black, white, and pending lists. I chose a rather simple data model
with a single table named
spam_EmailList. This table's definition is shown below:
spam_EmailList contains all of the email addresses Spam Blocker is
familiar with. Those whose
IsBanned attribute is 1 are in the black list. Those whose
IsPending attribute is 1, are in the pending list. And those whose
IsPending attributes are 0, are in the white list. The
AllowedInTo attributes determine if a particular email address is considered in the
white list if it appears in the To, the From, or both. This check is useful for mailing lists. For
example, I participate in a number of lists at ASPAdvice.com. When
I receive an email from one of these lists, the To address is the list's email address. Therefore,
by just adding the To address for these lists to the white list, I can receive all list traffic,
without sending challenges to the individuals who send the email to the list.
Spam Blocker works by performing the following steps every few minutes:
That's all there is to it! All that remains is to setup Outlook to download emails from the quarantined account as opposed to the infested account.
Spam Blocker uses Advanced Intellect's aspNetPOP3 component to access the infested account. aspNetPOP3 provides a .NET component for programmatically accessing a remote computer via the POP3 protocol, a standard email retrieval protocol. I was quite pleased with this product, it was very easy to use and the support was excellent. I'd highly recommend it if you have a project where you need to programmatically access a POP3 server.
Initially, the infested account was email@example.com, and the quarantined account was a private email address. Spam Blocker would forward white listed messages from the infested account to the quarantined account by downloading the entire contents of the email from the infested account, changing the To header to the address of the quarantined account, and then dump this file to the Windows 2000 SMTP Service pickup directory, which would automatically cause the email to be sent to the quarantined account.
This approach, while it worked, had a number of disadvantages. First, it was slow, as it connected directly to the infested account, which was on a computer on the other side of the country. When there were only a handful of messages in the infested account, the speed was acceptable. However, as the days of using the system passed, more and more unfiltered spam built up on the infested account. After a week there was nearly 800 spams. Downloading all 800 headers was taking several minutes on my DSL connection.
|Deciding When to Delete Pending Emails...|
This brings up an interesting point: deleting pending emails. Recall that when a spammer first sends an email to me, he will receive a challenge. Rarely, if ever, will a spammer respond to this challenge. Therefore, the spam sits in the infested account marked as pending. When is it safe to remove this email from the pending list and delete it? One day? Three? A week? Two weeks? A month?
Choosing a lengthy-enough time span is important for the following reason: assume someone you do want to hear from sends you an email, but this person is not in the database. He will be added to the pending list and a challenge will be sent. Now, imagine that this person sent this email right before leaving on a two-week vacation. Upon returning, he'll see (and hopefully respond) to your challenge sent immediately after receiving his email from two weeks ago. However, if pending emails are automatically deleted after a week, this person's legit email will have been lost, as it would have been deleted prior to the individual's return from his vacation.
Another distinct disadvantage of this approach was the fact that emails were being forwarded to another account. This had some very minor disadvantages. First off, this quarantined email address could only be used as the repository for filtered emails from the infested account. That is, I could not use the quarantined account for regular emailing because if spammers got ahold of the address, all of my efforts in filtering the firstname.lastname@example.org address would be in vain. Also, this level of indirection slowed down my emailing slightly. That is, it introduced another hop that the email message had to take before I could read and respond to it. Usually this wouldn't be an issue, but any downtime from the ISP providing the quarantined account could result in further delays.
To improve the performance of downloading remote headers and forwarding email to a quarantined account, I decided I needed to have a local POP3 server installed on my desktop. This POP3 server would have two accounts - the infested account and quarantined account. With this local POP3 server, Spam Blocker would simply download and then remove all emails from email@example.com onto the local, infested POP3 account. Then, Spam Blocker could simply cycle through the emails from the local POP3 server, thereby no longer needing to access a remote POP3 account.
In addition to a local POP3 infested account, I would also need a local POP3 quarantined account. Since both accounts would be local, I could "forward" white listed emails from the infested account to the quarantined account by simply copying a file from the folder that holds the emails of the infested account to the folder that holds the emails of the quarantined accounts. This concept is illustrated in the below image. (Note that the dashed lines depict the flow of data among the entities in the diagram.)
This approach clearly seemed the optimal choice. The only problem was getting my hands on a POP3 server! There are a number of commercial POP3 servers, but I didn't feel compelled into spending money on this project, so I turned to the freeware POP3 servers. While I found a couple, such as Baby POP3 Server, ArGoSoft MailServer, and others, none of them held up very well with hundreds of inbox messages. Some frequently had socket connections timing out, others just seemed to like to stop working every so often. I decided I'd have to find source code and/or create my own POP3 server.
After a bit of Googling I found LumiSoft
Mail Server, an open-source POP3 server written in C#. I gutted the POP3 protocol code from the
GUI portion, made some tweaks, and was off and running. I configured it so that the email messages
for a particular account were saved as text files in a specified directory. This had the pleasant
consequence of being able to forward an email from one account to another by simply moving a file
from one directory to another. Too, when sucking down all of the emails from firstname.lastname@example.org
to the local infested account, I was able to use aspNetPOP3's
and just squirt out the returned string to a text file in the appropriate directory.
I started working on Spam Blocker on October 4th and wrote the last line of code on October 8th. Aside from prettying up the code and the thin GUI attached to the POP3 server and the intermediary, all I have left to do is decide when to mark pending messages as stale... any suggestions? I am thinking of a one week freshness period.
So, has Spam Blocker been a success? It has been a wonderful tool is greatly reducing the sheer volume of spam I receive. As aforementioned, it has stopped over 100 pieces of spam email per day. My Inbox is so quiet now, I receive so little email - I never realized I was this unpopular! Virtually every piece of mail that comes into my Inbox now is quality. Very occassionally a spammer will respond to the challenge, but this has only happened twice out of thousands of spams. The solution? I simply remove them from the white list and shove 'em over to the black list.
While Spam Blocker has unquestionably reduced the quantity of spam, my only concern is that it has also reduced the quantity of non-spam as well, for it relies that if someone who is not yet in my email list database sends me an email, they must take the time and effort to respond to a challenge. When SpamArrest first came out, I was deluged by many people's challenged, and didn't fill out any. I figured I didn't need to talk to them anyhow. I didn't think it worth my time to respond to all these challenges.
Periodically I take a random sampling of the emails in the infested account, just to make sure some vital email isn't sitting there. Of course, the chances of locating such an important email are slim to none, as there are over 1,000 emails in the infested account. While my random peckings through the infested account have almost always turned up spam, I have stumbled across a few legit emails that people must not have taken the time to respond to the challenge. In fact, one individual sent an email shortly after his first, decrying the challenge and telling me he wasn't going to respond and that if I didn't want spam I shouldn't put my email address on public Web pages. So... who knows how many messages like this are floating out there in limbo.
I guess I can feel assured that if the email is really important than the person will take the time to respond to the challenge... at least, that's what I have to believe!
Working on this project has given me a lot to think about regarding, specifically, spam and C/R spam blocking systems, as well as communication in general. Personally, I believe that productivity and successful business are most achievable when there is as little as possible friction involved in the communication process. Adding barriers to any communication channel only slows down the flow of information. Email's resounding use is due largely to the ease and speed with which communication is possible.
Given these views, it is understandable that I find C/R spam blocking systems far from ideal. Employing such a system impedes new individuals from establishing contact with me. While I can rationalize that those whose voice I'd really like to hear would take the time to respond to Spam Blocker's challenge, I still have the harrowing feeling that I'm missing, if not several, one vital email. Perhaps Ed McMahon has emailed me informing me that I am a finalist in the Publishers' Sweepstakes, but if he has not responded to the challenge I may be out tens of millions of dollars.
Despite my uneasiness with C/R spam blocking systems, I see no other viable alternative currently. Spam is just too large a problem for me, being both majorly annoying and a major deterent to my work efficiency. I have tried various spam filters (such as CloudMark) - even wrote my own Outlook spam filter plug-in to try to deal with the problem! - but this approach worries me even more about keeping out legit emails. With the C/R spam blocking system, I have faith that if the email is very very very important, that the person will register; with an automatic spam filter, regardless of how important the message is, if it gets blocked I am not going to see it.
After building Spam Blocker, I was referred to this article by Brad Templeton: Proper Principles for Challenge/Reponse Anti-Spam Systems. Brad has some good ideas on principles and techniques for C/R spam blocking systems that warrants a read. Reading Brad's views makes me think that a truely useful "used by the masses" C/R spam blocking system is not possible without at least some inherent email client support. Of course, ask yourself this: Do you want the masses to use C/R spam blocking? Do you want to be hit with hundreds of challenges? Didn't think so...
There are some technical pitfalls with C/R spam blocking systems, ones that I don't think are overcomeable until either email clients get smarter or the C/R spam blocking system software becomes more advanced. First, consider the following scenario: Alice and Bob both utilize C/R spam blocking software, but neither Alice nor Bob have ever emailed one another before. Alice decides it would be nice to email Bob, so she shoots off an email to him. Bob's C/R spam blocking software receives the email and issues a challenge email to Alice. Alice's C/R spam blocking email receives the challenge and thinks this is an email from Bob, so her software issues a challenge to Bob! Uh-oh! We have deadlock here, neither user will ever get the challenge, so neither user will respond, so Bob will never get Alice's email.
This scenario could be avoided in a couple of ways, and would probably be best served by a combination of the following two methods:
For C/R spam blocking systems to become really useful, I think there needs to be C/R support built into an email client. This may start with online email clients, like Hotmail or Yahoo! Mail, since the turn-around time from idea to implementation is much shorter than with products like Outlook.
The ideal situation is for everyone to have a certificate (or PGP key or whatever) and then for everyone to sign all emails, and then only signed emails will be accepted by the mail relays around the world. Of course, this is a pipe dream, but it's a nice pipe dream, if I do say so myself.
|Added October 15th, 2003...|
Societal Response to C/R Spam Blocking Systems
In using my C/R spam-blocking system now for a couple of weeks, one thing I have noticed is that many people who encounter my spam-blocking system for the first time, don't seem to trust my programming, or the system in general. That is, when they send an email to me, and get a challenge, in addition to responding to it they'll also resend their initial message, perhaps thinking that the first message might be, or is, lost. This extra step, of course, is not needed, and likely only adds to the sender's frustration in having to pass through a C/R to email me.
Some metrics that I would be interested in collecting, given some free time, would be the following:
Of course I cannot hope to calculate statistics on the second metric, but as you can probably ascertain, my interest is in determining if the frequency with which users respond to the challenge is correlated with the number of C/R spam-blocking systems in use. That is, the more often I get challenges, the more likely I might be to say, "Screw it, getting in touch with this bozo isn't that important, I'm fed up." At this point the C/R spam-blocking system has become a C/R non-important-emails-blocking system. And who's to say that's a bad thing?
|Added February 12th, 2004...|
I've Given Up on Challenge/Response
The main reason I decided to stop the C/R system was because there were people who had said they were trying to contact me via email, but couldn't. The problem was that my challenge was either getting challenged back by their C/R anti-spam system, or their filters were blocking my challenge emails. So, not seeing the challenge, they were never able to respond, and not responding meant I didn't see their email.
At the time of this writing I have been using Spambayes for a couple of weeks, and am pretty impressed by its accuracy. I have had no false negatives, and only about 5 to 10 legitimate spams end up in my Inbox. I hope/expect this number to decrease as the number of spams and hams in my Spambayes database increases over time.
Implementing a challenge/response spam-blocking system was interesting, very educational, and a whole lot of fun. But in the end it was not only cutting out the torrent of spam, but also a handful of legitimate emails. I like my Inbox to be like the U.S. legal system, which would rather let several guilty men be announced innocent than one innocent man be found wrongfully guilty. Since my challenge/response system cut out some legitimate emails, I decided it had to go...
Copyright 2004, Scott Mitchell. All Rights Reserved.