Handling spam

I had subscribed to the freebsd-stable mailing list, and a question was posted that I felt I could provide some help on. Despite my earlier experiences with posting to a newsgroup, which lead to my private email address receiving spam (oh, what an innocent I was), I plunged in and sent a mail from my domain mail account. Within a few minutes, strange stuff started arriving in my inbox. To add insult to injury, the help I had offered turned out to be useless.

On closer inspection, I wasn't receiving spam, it was actually virus generated messages, purporting to be from Microsoft support. It was, in fact, the Win32/Swen.A or Gibe.F worm (or a look-alike). I figured that many of the people subscribed to the freebsd-stable list were actually infected with the worm without knowing it. So, it didn't seem so bad; except that I did start getting the occasional real spam message (Nigerian scam and make money quick stuff). I decided that this was an opportunity to investigate anti-spam techniques.

SpamAssassin seemed a likely candidate as an anti-spam tool. It's a perl application, widely used in the field. This didn't exist as a port for FreeBSD. In the ports there was SpamProbe, a Bayesian-based spam filter. However, this would need significant training before it would start to detect spam, whereas I would get protection immediately with SpamAssassin. Maybe I'll move to SpamProbe when I have amassed a significant amount of spam, since it looks like it will be less resource hungry (i.e. it doesn't need perl) than SpamAssassin.

I grabbed the SpamAssassin (version 2.60) tarball from their website. The build process is relatively simple:

cd Mail-SpamAssassin-*
perl Makefile.PL
make
make install                            [as root]

However, I received several warnings about missing perl modules, which I didn't like the look of. This was because the perl that is included with FreeBSD is old (5.005), whereas SpamAssassin wanted at least 5.6. This version (and 5.8) is available in the FreeBSD ports, so I installed it. To switch between the system version and the ports version, a handy command is provided: "use.perl". Run this as root, give an argument of "system" for the system version and "port" for the ports version, and the default will be set for all users. I must remember to switch back to the system version when I next perform a buildworld.

With the perl 5.6 version installed and activated, I tried to rebuild SpamAssassin. Hmm, now it complained that the module HTML::Parser was needed. I located this on the CPAN site, and downloaded and installed it. Now SpamAssassin wanted HTML::Tagset, so I grabbed a copy of that from CPAN. OK, now the SpamAssassin build was happy.

SpamAssassin reads the mail text via stdin, and writes it out through stdout. It sets various headers (e.g. X-Spam-Status:) to indicate that the mail has been processed. If it detects spam, via application of it's various rules, the X-Spam-Status: header is set to Yes, and additional text is added to the mail. I tested SpamAssassin with a piece of Nigerian spam:

spamassassin <mail_in >mail_out

Yep, it labelled it as spam, with a total of 18.2 points, way above the default threshold of 5 points. Next I tried the worm-generated mail. Hmm, this was not detected, I guess as it is not really spam.

I decided to solve the worm mailings by setting a maximum acceptable message size for local mailer of 128KB (131072 bytes). This was a little less than the typical size of the worm mail (with its attachment), but a lot larger than the typical email size I received on hydrus. The maximum size is set in the crimson.mc file:

define(`LOCAL_MAILER_MAX',`131072')

Now, how was I to apply SpamAssassin to my incoming mail? There seemed two options. The first involved calling SpamAssassin via the sendmail milter (mail filter) capability. The second used a replacement for the local mail delivery agent, procmail. I discovered that, whatever solution I chose, I would need procmail to support putting spam mail into a separate mailbox, since the default local mail agent is really dumb, and will only place email in your default system mailbox. Since using milter seemed invasive, I decided the procmail route was the one to try.

Procmail was in the ports, so it was a simple matter to build and install it. You could completely replace the local mail on a system-wide basis, or just use it on an individual basis. This latter approach seemed best suited to testing, so I used the .forward file to cause sendmail to direct incoming mail for me to procmail. The content of the .forward file is:

| /usr/local/bin/procmail

Procmail needs a control file, .procmailrc in your home directory. Mine was set up as follows:

# .procmailrc file for mpw

# filter through spamassassin
:0fw
| spamassassin

# put in spam mailbox if spamassassin says spam
:0H
* ^X-Spam-Status: Yes
Mail/spam

# anything from freebsd-stable goes into the appropriate mailbox
:0H
* freebsd-stable
Mail/freebsd-stable

OK, having done all that, all I had to do was wait...

Addendum - 26th January, 2005

Since I had installed SpamBayes on crimson, it seemed redundant to keep using SpamAssassin for spam defence. It also meant I could get rid of the version of perl from the ports, since SpamAssassin was the only reason I needed a later version of perl than that provided by the base FreeBSD system.

The replacement was simple (the README.txt file from SpamBayes describes everything you need to do); a small number of changes to the .procmailrc file, and the addition of a .spambayesrc to direct the filter to use the correct database. These files are shown below.

.procmailrc

# .procmailrc file for mpw

# anything from freebsd goes into the appropriate mailbox
:0H
* freebsd-stable
Mail/freebsd-stable

:OH
* freebsd-security
Mail/freebsd-security

# probably from a virus or worm (Sven.A)
:0B
* Microsoft Customer
Mail/spam

# Debian security mail
:OH
* debian-security
Mail/debian-security

# filter through spambayes
:0fw:hamlock
| /usr/local/bin/sb_filter.py

# put in spam mailbox if spambayes says spam
:0H
* ^X-SpamBayes-Classification: spam
Mail/spam

.spambayesrc

[Storage]
persistent_use_database = True
persistent_storage_file = ~/etc/default_bayes_database.db