Bayesian Kryptonite - spoofed email

I use POPFile bayesian filtering to keep email spam at bay. With a little training, this works amazingly well-- I'm at 99.8% accuracy, and that's with a little over a month of "training" precipitated by a recent server migration. But bayesian filtering has one big weakness that I'm seeing more and more: spoofed emails.

This is a companion discussion topic for the original blog entry at:

I wonder if they have filters that say more than 3 links and its junk…because I never get emails with more than 2 unless they are junk.

How many people send from mail servers other than the name they’re actually claiming nowadays? I know for a long while I was using but sending from If most people don’t do that, maybe blacklist mails where the sender doesn’t match the server? I don’t have much spam blocking experience (I just stick with whatever Thunderbird does for me) but I did have to run a spam filter on a Win2k3 server for a few months. I used GFI MailEssentials, which has several different forms of filtering that work independently:

Blacklists, whitelists, bayesian, keyword, and some other stuff I don’t remember.

The upside was that I could pretty easily add and remove key words (prozac) to the word filter and let the bayesian filter take care of the rest. One of my favorite features was the auto whitelist, which whitelisted anyone I ever sent an outgoing mail to.

About web services (eBay etc.) not sending email, I don’t know if the world is ready for that yet. It’s still the only ubiquitous form of net communication that a lot of people are willing to give out their connection details for. What’s next, instant messenger?

Sorry for not bothering to read your POPFile entry until after I commented, it looks like you’re on the right track looking at multiple filtering tools getting you from 98% to 100%. I certainly can’t think of anything better at the moment.

I’m glad you have an RSS feed.

How many people send from mail servers other than the name they’re actually claiming nowadays?

That is the other way to attack the problem: actually validate the identity of the sender (or at least the server sending the email). There have been some baby steps in this direction from Yahoo and Hotmail but I’m not sure if anything substantive has come from it yet.

It’s definitely a good idea, but the architecture of POP3/SMTP isn’t built around identity or even security-- so it’s hard to retrofit.

I mean, if you’re going to do something, at least try a little harder.

Right. Some of the spoofs I’ve seen recently were quite good. Very professional looking, no misspellings, etcetera.

My rule of thumb is “could my Mom tell this was fake?” For a lot of recent spoofs, the answer is no.

In paypal case: 3600 IN TXT "v=spf1 mx ~all" 3600 IN TXT “spf2.0/pra mx ~all”

That’s it… everything coming thru paypal must come from those ranges… this way just block everything that fail on SPF tests.

Baysian by itself can’t handle problems like this one.

Actually the logical choice would be for them to offer personalized RSS / ATOM / RDF feeds for their users. So I can get one feed from them that has all their general news (customized by me) and all news specific for me. Through in some HTTPS if they want and I know I am getting the straight dope from the horses mouth.

It’s still in dev mode, but at a href="" we’re working on developing a Messaging over RSS application.

It sets up reciprocal RSS feeds between you and your contacts so that you can message back and forth (like email), but entirely over RSS.

Hopefully by switching from a SendTo to a PullFrom architecture it could drastically reduce the amount of spam and phishing people have to deal with.

Both and work fine. I always report PayPal and Ebay spoof messages. Note, though, that won’t accept an attached message - you have to forward the message to them.

GMail actually has a decent spam filter, and it also has good spoof e-mail detection. I got a phishing e-mail the other day, that was spoofing the e-mail address,, and I got a bright red message at the top that said the following:

“Warning: This message may not be from whom it claims to be. Beware of following any links in it or of providing the sender with any personal information.”

Now admittedly, anybody in our line of business should know immediately whether an e-mail is legitimate or not, but it’s still a good thing for the more non-technical people using e-mail.

On an amusing note, the e-mail I’m referring to was the worst attempt at phishing I’ve ever seen. Check out this snippet from the body of the e-mail:

“U need to update ur account once again, u forgot fill in ATM PIN at from update, come to link below and do it.”

I mean, if you’re going to do something, at least try a little harder.