How to build an effective spam tool in under 2 years!

No spam!
No spam!

Spamming has become a huge international business and many very smart spammers are using a massive array of sophisticated tools to get spam delivered to inboxes.  Our Spamassassin install, which we’d using for about 10 years to block spam from our users inboxes, and configured the way we had it configured, was just no longer keeping up.  It was time for something more effective.

(Don’t get me wrong, Spamassassin is likely fantastic if you can configure and utilize its per-user Bayesian learning, which we couldn’t do because of the way our clustered mail servers are configured.)

The Choices

My goal was to find Open Source, anti-spam software that would allow us to provide per-user learning and that I could squeeze into our existing mail cluster.  The keys were that the solution had to be 1) really effective at identifying spam and 2) configurable so that our users with IMAP accounts could drag spammy messages into mail folders to train it.  i.e.  Sending spam email to a special address or logging into a web site to mark spam wasn’t going to cut it. It had to be really, really easy to use and it had to work via IMAP folders.

Nothing looked good enough, nor fit all of our needs, except for DSPAM.  While DSPAM development has stopped it still seemed that it was a tool that was finished enough, and good enough, that it might be excellent at identifying spam and could be configured to work with our existing Open Source mail server setup (custom Qmail + Dovecot servers behind Linux load balancers.)  And with DSPAM’s machine-learning algorithms I felt it should be able to adapt to ever changing spammer techniques over time making it a good long-term solution.

The Install

As is my usual style I researched the large-history of user contributed posts and tutorials on how to configure DSPAM and came up with a plan that I though might work (i.e. I Googled it.)  This stuff is really what I love most about Open Source software; searching the intertubes to find out how to get something done and then hacking away to see if it works.

I’ll go into the technical details of how I did this if enough people ask, but for now, I can say this:  While it took me just a few weeks, working on it a couple of hours per day, to get DSPAM up and running, it took a full-year before it was polished enough to allow our users to sign up for it. There were lots of small bugs to work through on a live and very busy email system where email delivery is extremely important. So I had to be super careful, open it up slowly, watch and do a lot of listening.

The Results

Our users have been using DSPAM for about a year now, and while its not for everyone (my wife hates it!) the users who take a little time to train DSPAM are totally addicted to it.  And the spam identification rates are really better than I could have hoped for.  Here is a real-life, automated email that was sent to one of our users from our servers.  The statistics are based on results from the dspam_stats script.  And these stats are fairly typical of a real user on our servers using DSPAM correctly:

Dear mxxxs@dxxxxxxxxs.com:

You’ve been subscribed to Brownrice’s Advance Spam Protection (DSPAM) for 240 days. Here is an analysis of your usage:

Your DSPAM Stats over the last 30 days

  • 364 email messages have been processed in the last 30 days.
  • 82.14% of your incoming messages have been spam.
  • DSPAM has been 99.18% accurate at correctly Quarantining incoming spam (Spam that got through to your inbox: 2. Good email that was incorrectly moved to your Quarantine folder: 1.)

Your DSPAM Stats over the life of your account

  • 19745 email messages have been processed since you signed up for DSPAM.
  • 78.62% of your incoming messages have been spam.
  • DSPAM has been 93.39% accurate at correctly Quarantining incoming spam (Spam that got through to your inbox: 1248. Good email that was incorrectly moved to your Quarantine folder: 57.)

The Conclusion!

While our DSPAM install doesn’t support POP accounts, which tends to piss off those old-school Outlook users (how the hell do they synch their phone email anyway?) and it does require that our users spend a little bit of time training DSPAM (garbage in, garbage out!) its really been fantastic for most of our users, including me.  My DSPAM stats:

Dear oban@brownrice.com:

You’ve been subscribed to Brownrice’s Advance Spam Protection (DSPAM) for 600 days. Here is an analysis of your usage:

Your DSPAM Stats over the last 30 days

  • 5092 email messages have been processed in the last 30 days.
  • 41.34% of your incoming messages have been spam.
  • DSPAM has been 99.04% accurate at correctly Quarantining incoming spam (Spam that got through to your inbox: 37. Good email that was incorrectly moved to your Quarantine folder: 12.)

Your DSPAM Stats over the life of your account

  • 100459 email messages have been processed since you signed up for DSPAM.
  • 36.28% of your incoming messages have been spam.
  • DSPAM has been 97.19% accurate at correctly Quarantining incoming spam (Spam that got through to your inbox: 2602. Good email that was incorrectly moved to your Quarantine folder: 216.)
  • All statistics are based on the assumption that you are moving all spam to your isSpam folder and checking your Quarantine folder from time to time and moving good email from there to your isNotSpam folder.

If you have additional questions about Advance Spam Protection you can view the Brownrice FAQ here: http://support.brownrice.com/index.php?action=artikel&cat=1&id=229

 

Is this a money maker for Brownrice?  Heck no, especially considering that I spent many hundreds of hours working on it.  However, in this era of hosting companies throwing in the towel and either doing a really poor job of handling spam, or just moving all of their mail services to say, Gmail, its been a serious winner. Our customers are often staying with us, or moving to us, because of our spam protection.  And the best part is that we can say “Yes, we certainly can help you with that spam problem!”

Questions?

~ Oban

About Oban

Oban manages the Brownrice Internet staff, keeps the network humming, and chases his wife and twin boys around during his time off.

Leave a Reply

Your email address will not be published. Required fields are marked *