Some Plan(s) for Spam

After struggling with spam e-mail for years the old fashioned way-- highlight, DEL-- I finally succumbed and installed POPFile on my server. POPFile uses a Bayesian Filter technique and it is amazingly effective. Within a day I had 95% accuracy; within a week I had 97% accuracy. Two months later, I'm up to nearly 99% accuracy:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2004/09/some-plans-for-spam.html

Here’s a cool bit of code that generates a CAPTCHA in ASP.NET:

http://www.codeproject.com/aspnet/CaptchaImage.asp

The big part about any filter is false positives. I personally never want to lose email, I just want it away from my Inbox.

With the seemingly impending canning of Sender ID or the “Email Caller ID” crap there needs to be something done about spam.

Honestly I see the only course is to redo the SMTP protocol into a slightly better alternative. POP is okay but even that is a very basic and primitive protocol written seemingly for 1986, not 2006. I think they’re both in need of some revamps into something new and different, not XSMTP or some other name that adds 2 lines of code. Sure you break a lot of mail servers but if the idea is structurally sound why wouldn’t everyone jump to it?

I guess it’ll take spam and viruses a couple of more years before the problem gets “really” serious. (As if it wasn’t serious in 2000 or any day prior to now). Hell I remember spam in my AOL account back in '94 and it was a big deal then too. Blah anyways I suppose something will be done some day. Until then, Baysian all the way.

You should definitely try SpamArrest (or another similar human-only Whitelist) if your main concern is false positives. Mails not verified human aren’t false positives, technically, they’re just in limbo. With that caveat, it is 100% effective-- you will ONLY get email from human beings from that point on, guaranteed!

It’s too early to tell if anything will come of SenderID and the like. We can always hope…

I know this comment is a couple of years after the post, but the Anti Spam SMTP Proxy Sourceforge project (http://assp.sourceforge.net) provides the be-all, end-all server-side actually works spam filter. Or at least the very good beginnings of one.
I set it up on my server about a month ago, and in that time it’s reduced the spam I personally get from about 100 per day to about 2 per day, and those are marked as spam - I only get them because it’s still in “test” mode, allowing me to train it more effectively. I’ve only gotten 2 false positives personally.
On that server, I also host several other domains and about a dozen users. I don’t know the spam load on the other users, but I’ve been keeping track of the messages that are being marked as spam and I’ve only seen a couple of false positives (which I quickly remedy by using ASSP’s mail interface).
The most powerful feature of ASSP is not its bayesian filtering (although that’s an integral part of the setup), it’s the delaying feature (also known as greylisting). I won’t go into a technical description here (google is your friend), but ASSP leverages the behavior of valid MTAs in such a way that spam sent from zombies or other poorly-implemented spam sources doesn’t even make it to the server, saving me bandwidth and processing power.
Anyway, if you’re managing a small to medium-sized mail server and want to filter spam at the server level, I heartily recommend ASSP.

-Peter