Debian Stable + Exim3 + Amavisd-new with Spamassassin, Razor, F-Prot filtering for Exchange.

Note: Since the release of Debian 3.1 in the first week of June, the installation specified below and subsequent apt sources may not work. I will do my best to update everything to work with sarge in the near future. Chances are that if you replace "stable" with "woody" in all the apt listings, you could get a working system. Otherwise, check back later.

This document was last updated on March 4, 2005.

Revision information:

scenario

I have been doing work for a corporation with less than 50 employees locally. The main server is running NT4.0 with Microsoft Exchange version 5.5 SP4. They are not at a point where they need to or want to upgrade their server to something newer.

They also have a spam problem. I did some research and I could not find a cost effective long term mail filtering solution for the company. There is no software that I know of that can be installed on anything earlier than Exchange 2000 for any sort of mail filtering.

My idea was to set up a GNU/Linux based server that would run all the incoming mail through SpamAssassin (and anything else my heart desired) before it reached the Exchange server. I successfully accomplished this using Exim3 as the mail transport agent, SpamAssassin for spam control, F-Prot for virus removal, and Amavisd-new to control it all.

For the ASCII enthusiasts, here is how the delivery of a mail message might look:

__________        _________        ___________               __________
|        |  -->   |       |  -->   |         | trash or tag? |        |
| public |________|company|________|GNU/Linux|_______________|Exchange|
|internet|        | router|        |  filter |  -->  ||-->   |        |
|________|        |_______|        |_________|       |v      |________|
                                                (quarantine)

debian install

Debian Woody is my favorite, net-installing using the BF2.4 disks or Mini-Install CD on a basic workstation with a single network card. Not all the packages we need are in the stable pool, so we turn to backports.org for some things. Add this lines to your sources.list:

deb http://www.backports.org/debian stable arj file razor unzoo
deb http://people.debian.org/~aurel32/BACKPORTS/ woody-amavisd-new main
deb http://people.debian.org/~aurel32/BACKPORTS/ woody-spamassassin main

Here's a link to my sources.list file if you'd like to see the whole thing.

packages to install

After your base install and any personal favorite packages, you should apt-get install the following:

Here is a copy of my dpkg -l output on this server, just in case I missed anything important. I'm sure some of the packages I have installed are unneccessary as well.

installing f-prot

Of course, almost any linux based virus scanner will work with amavis, but I've had great success testing F-Prot. F-Prot for GNU/Linux is free for non-commercial use, and cheap for anything otherwise. You might do a trial download and see how well it works for you and then purchase the licenses. Download and dpkg -i the .deb file.

Create an entry in root's crontab to run the definition updates command, appending the output to a log file if you'd like. root's crontab for me looks like this:

0 0 * * *     /usr/local/f-prot/tools/check-updates.pl >> ~/fprotupdatelog.txt

nameserver config

The way that email programs know how to deliver a message is by looking up the mail exchange (MX) record of a host. For example, to send mail to ray@raygibson.net, my DNS is contacted ask asked what the MX record for raygibson.net is. You can do this by running "host -t mx raygibson.net". You see that it's some long dreamhost.com hostname that resolves to some god-awful IP address that my mail eventually goes to. Great.

How does this affect us? Well, if your box receives a message for your domain, it's going to look up the MX record for your domain and see that your company's public IP address is where it's supposed to go, yet delivery to that will just bring it back to the same box, etc etc continuous mail loop. We don't want that. We need to tell the GNU/Linux machine to deliver any mail for your domain to the Exchange server. This is done through a private local DNS record, so that when the server resolves the MX for it's own domain, it points directly to the Exchange computer on the network which gladly accepts the message and delivers it to the proper mailbox.

At first I considered modifying the DNS zone on the NT Server and specifying that machine for the GNU/Linux box to use, but I decided against it because I didn't want to screw anything else up on the network!

apt-get install bind9. At this point, the machine should have a static IP address on the network and you should remove your dhcp-client package just in case (apt-get remove dhcp-client --purge). Edit your /etc/resolv.conf file to be the following, obviously replacing yourdomain.com with whatever your email addresses say after the @ sign.

search yourdomain.com
nameserver 127.0.0.1

add these lines to the end of /etc/bind/named.conf.local (note: if this file doesn't exist, then just add the lines to the end of named.conf):

zone "yourdomain.com" IN {
     type master;
     file "/etc/bind/yourdomain.zone";
     notify no;
};

To correspond with the above lines we added, we need to create that /etc/bind/yourdomain.zone file, the contents of which should look like this:

$ORIGIN yourdomain.com.
;
; Zone file for yourdomain.
;
@     IN     SOA     yourdomain.com. root.yourdomain.com (
                     2001021201      ; serial, does this number matter so much?
                     3H              ; refresh
                     15M             ; retry
                     1W              ; expire
                     1D )            ; minimum
;
             NS      smackdown       ; this is the name of your linux box
             MX      10 ntserver     ; this is the name of your exchange box
;
localhost    A       127.0.0.1
smackdown    A       192.168.1.25    ; this machines name and IP
ntserver     A       192.168.1.2     ; the exchange server and IP

You can test this setup from your GNU/Linux box by running "host -t mx yourdomain.com". If it works, it should return the line "yourdomain.com MX 10 ntserver.yourdomain.com" or something similar. Also, a "host ntserver" should return "ntserver.yourdomain.com A 192.168.1.2" with the proper values, of course. If nothing like this happens, check your /etc/resolv.conf file to make sure it's as stated above, and maybe try restarting bind.

exim v3 configuration

First, run the eximconfig command and choose option number 1: Internet Site. The visible name of your mail system can be just the hostname (like smackdown!!). When it asks you the question about what domains you would like to relay mail for, be sure to put yourdomain.com, or else all your incoming mail could be rejected!

To directly configure amavis with exim 3, you need to edit the /etc/exim/exim.conf file. I actually got this information directly from Adrian's Debian+Exim+Amavis Psuedo HowTo.

Put this somewhere (at the end maybe) of the TRANSPORTS section:

amavis_smtp:
        driver = smtp
        hosts = localhost
        port = 10024
        allow_localhost
        hosts_override

Next, add a director for amavis. This has to be the first entry!!

amavis_director:
        condition = "${if eq {$received_protocol}{scanned-ok} {0}{1}}"
        driver = smartuser
        transport = amavis_smtp
	verify = false

Lastly, add a router for amavis. This has to be the first entry!!

amavis_router:
        condition = "${if eq {$received_protocol}{scanned-ok} {0}{1}}"
        driver = domainlist
        transport = amavis_smtp
        verify = false
        route_list = * localhost byname
        self = send

Make sure you spell everything exactly as above, or you could end up with a mail loop.

Also, you need to add the user "amavis" to the trusted users section, the variable is trusted_users and the data is separated by colons.

Occasionally your exim is going to want to send out delivery failure notices, which will probably fail themselves. Put these lines toward the beginning of you exim config file in the 'main configuration section', it will clean your queue of any stuck delivery-failure reports.

auto_thaw = 3h
ignore_errmsg_errors_after = 2d
timeout_frozen_after = 7d

configuring spamassassin

SpamAssassin will be running as the user amavis, so we will be using the bayes database found in the home directory of the amavis user. I use the following text in my /etc/spamassassin/local.cf file:

dns_available yes

use_bayes 1
bayes_path /var/lib/amavis/.spamassassin/bayes
auto_learn 1

use_razor2 1

score RAZOR2_CHECK 2.500
score BAYES_99 4.300
score BAYES_90 3.500
score BAYES_80 3.000

This config file instructs SpamAssassin to use Vipul's Razor spam database, and gives the message a higher hit score if it comes back positive for spam.

If you're interested in Bayesian learning of your spam, look below where I talk about reconnecting to the Exchange server. You'll see my learning script there.

configuring razor

In order for Razor to work correctly we need to register with the main Razor servers. In order to do this we need to be the user amavis, so as root, type "su amavis" and then you'll be logged in as amavis. At that point you should run these two commands:

razor-admin -d -create -home=/var/lib/amavis/.razor
razor-admin -d -register

configuring amavisd-new

So we've configured Debian, Exim, SpamAssassin, Razor, and F-Prot... Now we bring it all together by configuring Amavis!

The huge-assed amavisd-new configuration file can be found in /etc/amavis/amavisd.conf. You should page through this thing line by line and configure all necessary variables. $MYHOME should be /var/lib/amavis, $mydomain should be obvious, etc...

amavisd-new is very extensible, it works with multiple MTAs and Virus Scanners. Go through the config file and make sure that all the Exim and F-Prot stuff is uncommented, and everything else that you're not using (sendmail, mcafee, etc) is commented out!

Update: amavis.conf does not include the f-prot client in the @av_scanners variable by default, just the daemon, and has it in the @av_scanners_backup variable. I still like using the command line version, so I would copy and paste the three lines about the f-prot client in the backup section to the normal scanners section. I think things will run a bit quicker this way, but you need to make sure that everything else virus related is commented out.

In Section III for logging, at least for testing purposes, you should set it to log to a separate amavisd.log with a high log level, so you can watch it and make any necessary changes as you go along.

Read carefully through Section IV as it decides the fate of all your email messages! I discard all viruses and pass all spam. It might be a good idea to set yourself up as a $virus_admin for some time to make sure it's working well.

It could be useful to turn SpamAssassin debugging on in the file ($sa_debug = 1;) in the SA section to see that things are working well at first. Take a good look at the $sa_tag2_level_deflt variable in the SpamAssassin settings. The default configuration file specified a hit value of 6.3 as the point at which the word ***SPAM*** is inserted into the headers, but at first I found that half of the spam was slipping through with a 5.5 or a 5.7 or a score just beneath the threshold. I personally set my tag2 score at 5.0.

This is my amavisd.conf file is anyone is not sure how to format something.

testing and debugging

At this point, after everything has been configured, you're ready to test the box! Set it up on your network and turn the box on, make sure everything is running fine. We don't want to start handling all the mail just yet, so from a workstation in the office set up an email client using the IP address of your box as the SMTP server. Then, send a message to someone within the company. It will leave the workstation and should run through exim and amavis on the GNU/Linux box. Examine all the log files closely, did exim correctly pass it off to amavisd? Did amavis run it through f-prot? Did amavis run it through SpamAssassin? Did SA run it by Razor? DID THE PERSON YOU SENT IT TO RECEIVE THE MESSAGE??? Make sure everything is working before you put this thing to work!

Go to eicar's web page and send the 68-byte virus test string in the same manner as the test message you just sent. This message should never reach the recipient because it was tagged as a virus!!

When you're confident that your system isn't going to eat any messages for lunch, finally point port 25 of your company router to the GNU/Linux box. Watch the log files like a hawk for the first few messages, make sure everything is working. Send some test emails from outside accounts and make sure they get in. Go get a cup of coffee and wait for the spam to pour in and see how accurate the box starts out as. Chances are, it's not going to work very well. Read on and I will tell you how to significantly increase your accuracy.

spam and ham learning

One thing that won't change is that everyone out there gets different kinds of spam. Some get more porn/viagra/etc than others do, and some things one person might consider spam someone else might consider ham, or mail that they want to read. Because of this, SpamAssassin isn't going to be anywhere near perfect in the beginning. It is important to train SA about what kind of emails your company receives. SA has a tool called sa-learn that you use for this purpose.

From the manpage of sa-learn:

Learning filters require training to be effective. If you don't train them, they won't work. In addition, you need to train them with new messages regularly to keep them up-to-date, or their data will become stale and impact accuracy. You need to train with both spam and ham mails. One type of mail alone will not have any effect.

SpamAssassin will not use the Bayesian heuristics until there is a significant number of emails in the database, and it will not be really effective until there are a larger number upwards of 1000. The system itself is pretty good at auto-learning and evolving as time goes on, especially for the strongest spam and ham messages, but it needs to be trained initially, which I will get into.

The easiest way to pull emails from the Exchange server is to create an account that people can copy their emails directly into. On the Exchange server (or the domain controller), create a new user called "spam" (or whatever you fancy) and create an Exchange account. Log into a Windows 2000 or XP Professional workstation as the spam user and set up Outlook with Exchange server. Once you're in Outlook, create two folders beneath your inbox, appropriately called SPAM and HAM. Edit the permissions on your Inbox and these folders and make sure that "Everyone" in the company has read and write access to your folders. Log out of the workstation and delete the newly created profile to save hard disk space on the workstation.

Back on the GNU/Linux box, we need a way to download and learn any of the mail that are in the now "public" folders on the spam user's exchange account. We accomplish this using Fetchmail and Procmail. Afterward, the mail is fed into sa-learn for fun and profit.

All the actions of learning should be done as the user "amavis". It might be a good idea to give amavis a password. As root, type "passwd amavis" and assign something not-so-boring.

client setup

On each user's computer, create a folder beneath their inbox called SPAM. Set up a message rule with the rules wizard so that any message that arrives with the word "***SPAM***" in the subject line automatically gets moved to the SPAM folder you just created.

Also each users computer (or at least the ones who get the most spam), open the tools->services->exchange server menu, go to the advanced tab, and add the user "spam" to the list of other user's email folders to open. This will make "Mailbox - spam" appear at the bottom of their folder list, which you can expand into Inbox and SPAM/HAM. Make sure you can copy messages to and delete out of both of those folders.

server setup

Any mail that is put into the public spam user box on the exchange server needs to be downloaded onto the GNU/Linux server and processed. This is done with a simple cron script that uses fetchmail, procmail, and sa-learn.

As the user amavis, create the following .procmailrc file:

PATH=/bin:/usr/bin:/usr/local/bin
MAILDIR=$HOME/Mail
DEFAULT=$MAILDIR/Inbox
LOGFILE=$MAILDIR/logfile.txt
SHELL=/bin/sh
:0:
Inbox

As the user amavis, create the following .fetchmailrc file:

poll ntserver protocol imap username spam password spam

In the .fetchmailrc file, obviously change the server name and password to something that works for you. Don't use the password "spam" as I have shown above, especially if you have any means for your users to log in remotely. If you `man fetchmailrc` you'll get a little more information on how to phrase your file.

Put the following line in the user amavis's crontab:

5 * * * *       /var/lib/amavis/learnspam >> ~/loglearning.txt 2>/dev/null

This of course runs the script (below) every hour at the 5 minute mark. You could also make this a daily script, it just depends on how you would like it set up.

This is my learnspam script, with some commenting to make more sense:

#!/bin/bash

date                                              # when?
                                                  #
rm ~/Mail/Inbox                                   # start clean
touch ~/Mail/Inbox                                # 
fetchmail --folder inbox/spam --all               # download all the unlearned spam

INBOXSIZE="blah..."                               # something besides zero.
NEWSIZE="0"

while [ "$INBOXSIZE" != "$NEWSIZE" ]              # continually compare the sizes and wait
do                                                # a few seconds, just to make sure that
INBOXSIZE="$NEWSIZE"                              # all the mail has been downloaded,
sleep 5                                           # this is important because fetchmail
NEWSIZE=`ls -l ~/Mail/Inbox | awk '{ print $5 }'` # exits before all the messages show
done                                              # up in the file Inbox due to amavisd

sa-learn --mbox --no-rebuild --spam ~/Mail/Inbox  # sa is hungrrry for spam
rm ~/Mail/Inbox                                   # start over
touch ~/Mail/Inbox

fetchmail --folder inbox/ham --all                # unlearned ham this time

INBOXSIZE="something..."
NEWSIZE="0"

while [ "$INBOXSIZE" != "$NEWSIZE" ]
do
INBOXSIZE="$NEWSIZE"
sleep 5
NEWSIZE=`ls -l ~/Mail/Inbox | awk '{ print $5 }'` # this ls..awk command gives the file size
done                                              # cool huh?

sa-learn --mbox --no-rebuild --ham ~/Mail/Inbox
rm ~/Mail/Inbox
touch ~/Mail/Inbox

sa-learn --rebuild                                # at the end, rebuild the index

This script is pretty good at gettings all the mail, and only processing it after its all there and dandy. You can test it by putting some spam in the public spam user folder and running this script straight from the command line.

wrapping it up, training spamassassin

Watch your log files! Send emails! Check emails! Make sure this thing works before you drop it in a production environment. The last thing you want to do is lose email.

Like I said earlier, SpamAssassin is useless until you train it. Find the users at your company that get the most spam. Drag at least two weeks worth of spam into the public spam folder (this is a good time to test and see if your learning script works). Find a bunch of good emails too and drag those into the ham folder. Remember, SA doesn't take effect until 200 messages are in the database, and really doesn't affect anything until there are at least 1000 messages! Go around to several users and copy/move the right messages into the right folders.

Finally, tail -f your amavis.log and watch it work beautifully. Once it gets running for a couple of days, it wouldn't hurt to lower your loglevel to 0 and change to syslog logging in your config file. I do this now because syslog goes through logrotation and I won't end up with a 30 meg amavis log file after a few weeks.

Don't forget to restart amavis after making changes to the config file by running the command "/etc/init.d/amavis restart". Also something to note, if an e-mail arrives while you are restarting amavis, exim will delay the delivery of all the messages arriving in the next 15 minutes by 15 minutes. Don't panic if it doesn't start processing mail right away after a restart. Look at your /var/log/exim/mainlog if this happens and you will see what's going on.

reducing your spam count

As time goes on and your filters get smarter and smarter, there is something you can do to reduce the total number of spam messages that arrive in your inboxes. You'll notice after a good bit of SpamAssassin training that it's usually "right" about the messages that come through. Often the most legit messages will have a negative score, let's say -4. Spam comes in all shapes and sizes, but because of our learning process the "same old spam" you get every day will start to get a higher hit count, until many of your messages come back with a score higher than 20!

If a lot of these messages are so obviously spam, why deliver them at all? I know it's bad practice to "delete mail" but why would anyone want to look at a message with a hit count of 19.328? We can now start to drop the highest of our spam messages for sure.

There are a couple of lines you need to change in your amavis.conf file for this to work. First find the value $final_spam_destiny and change it like so:

$final_spam_destiny = D_DISCARD;

The next thing you need to change is your discard value. Find the line $sa_kill_level_deflt and set it to a score you're willing to never see. I set mine at 10. My lines look as follows:

$sa_tag_level_deflt  = 3.0; # add spam info headers if at, or above that level
$sa_tag2_level_deflt = 5.0; # add 'spam detected' headers at that level
$sa_kill_level_deflt = 10.0; # triggers spam evasive actions

The above lines show that a message gets ***SPAM*** at 5 points and doesn't pass the server at all if it gets 10 points. Make sure you don't do this until you are comfortable that your SpamAssassin database is large enough to be accurate all the time!!! The last thing you want to do is discard an important message. It might be safer to at first set the kill level much higher, like 16 or 17, and move down slowly until you are most comfortable with the amount of mail coming through.

Watch the messages pour in on the logfile, you will see many "not-delivered," at least for the spammiest of all spam messages.

that's it!

Feel free to email me for any specific questions, I will answer them as best I can.


ray@raygibson.net Valid XHTML 1.0!