Configure DCC For Maildrop

Spam is getting worse all the time. I have used the Maildrop Spam Filter since August 2001. Recently, I came across DCC at Freshmeat. A DCC client generates a fuzzy checksum on a mail message and sends that checksum to a DCC server; the server returns counts associated with that checksum. If enough others have sent in messages with that checksum, the DCC client can reject it based on a specified threshold.

DCC documentation provides instructions to configure the DCC client with Sendmail and Procmail, but not with Maildrop, which I use as part of Courier. The maildrop xfilter option could be used to call the dcc client dccproc, but when the checksum threshold count is exceeded, dccproc returns a non-zero exit code, which will cause maildrop to exit. I wrote a bash script that adds a header line that maildrop can detect that indicates a spam threshold result and then returns with a zero exit code to keep maildrop happy.


DCC and my email system

DCC stands for Distributed Checksum Clearinghouse, on the web at http://www.rhyolite.com/anti-spam/dcc/. Check there for a detailed description of the DCC package and its use. For email services on my home-office LAN, I use Courier's SMTP, IMAP, and Maildrop programs. I retreive my email from my domain account using Getmail and send the messages thru Maildrop which sorts out my lists and files them away for the IMAP server to provide to my IMAP clients, Outlook, Outlook Express, Netscape, and Mozilla.

When I first configured Getmail and Maildrop, I found Maildrop Spam Filter through Freshmeat. I followed the documentation and tested it out. I was unable to configure the blackhole lists, so I commented out that part of the filter. I have added more text to the screens. In my use, it has been about 50% effective in screening spam, but as half the spam still comes through and half screened is not spam, I wanted something more effective.

By chance, I found DCC on Freshmeat. DCC works on the principle that mass mailings go to many clients and that a count of how many clients have received a given message can be used to imply that a message is a mass mailing. If that mass mailing is unwanted, it is spam. Clients scan mail messages and generate one or several fuzzy checksums on the message and its header. The fuzzy checksums discount certain personalizations made to mass mailings. The clients send the checksums to a server which tallies the checksums and reports the tallies back to the client. The clients add a line to the header indicating the fuzzy checksum counts and if the tallies exceed the thresholds specified, the client indicates so to the mail system. The client exempts certain mail from the checksum process: mail from specified users, usually local users, and mail from senders of wanted bulk mail. The exemptions are specified in whitelists, both system-wide and user-specified.

DCC provides dccproc, a client that takes a mail message on standard input. If the message does not match the whitelists, dccproc generates DCC's fuzzy checksums and sends the checksums to a specified DCC server. The server adds to its tally for the checksums and returns counts of the checksums. If a message does not match the whitelists, dccproc adds a header to the message indicating the count the server provided for the fuzzy checksums. dccproc returns the message with the added header on standard output. If the checksum count exceeds the specified threshold, dccproc returns with an exit code of 67; if the message matched a whitelist entry or does not exceed thresholds, dccproc returns with an exit code of 0.

Maildrop provides a command, xfilter, that allows calling a message filter from within a mailfilter. Unfortunately, maildrop will exit prematurely if a program called by xfilter returns an exit code other than 0. Therefore, I wrote a bash script to call dccproc and return with an exit code of 0. The script also adds a header that can be tested by maildrop.


Compiling dccproc

I run my mail systems on Red Hat 7.2 Linux, so I wanted to change the default locations used by the dccproc sources to match those used by Linux. I used the following command line (all on one line) to put them in /etc/dcc and /usr/lib/libexec:

[root@donut dcc-dccproc-1.0.53]# ./configure --prefix /etc/dcc --libexecdir /usr/lib/libexec ; make install

When this completed, I found a file /etc/dcc. I renamed /etc/dcc to dcc-configuration, created a /etc/dcc directory, moved /etc/dcc-configuration to /etc/dcc, and copied homedir/* from the dcc sources to /etc/dcc and then modified them per dccproc's man.

I configured dcc to use rholite's dcc server with this set of commands:

[root@donut dcc]# chmod 600 map.txt
[root@donut dcc]# cdcc "delete dcc.domain.com"
[root@donut dcc]# cdcc "add dcc.rhyolite.com,- anon"

I was able to confirm the list of servers with the command

[root@donut dcc]# cdcc info
# 04/15/02 22:49:28 CDT /etc/dcc/map
# Will re-resolve names after 23:49:28
# 81.70 ms chosen delay 4 total addresses 3 working
IPv6 off

dcc.rhyolite.com,- anon
# 192.188.61.3,- calcite.rhyolite.com
# not answering
# 195.74.212.70,- wanadoo-be server-ID 1016
# 100% of 1 requests ok 146.87 ms RTT 1 ms queue wait
# * 207.8.219.218,- irc-ssl.sackheads.org sackHeads server-ID 1012
# 100% of 1 requests ok 81.70 ms RTT 2 ms queue wait
# 216.158.54.132,- dcc.etherboy.com Etherboy server-ID 1002
# 100% of 1 requests ok 168.52 ms RTT 0 ms queue wait


Bash script for maildrop

I put the following script in /usr/bin

[root@donut etc]# cat /usr/bin/dccproc-maildrop
#!/bin/sh
# dccproc-maildrop
# 20020415

REFORMAIL="/usr/lib/courier/bin/reformail"
EX_NOUSER=67

# Send this message to dcc server,
# saving output so exit code can be tested.

dccproc -Rw whiteclnt -c CMN,10 >~/THISMESSAGE

# Test exit code; if it shows threshold was exceeded,
# add header that says so, otherwise add header that says not

if [ $? == $EX_NOUSER ] ; \
then $REFORMAIL -A'X-DCC-OVER-THRESHOLD: true' < ~/THISMESSAGE; \
else $REFORMAIL -A'X-DCC-OVER-THRESHOLD: false' < ~/THISMESSAGE; \
fi

# Remove the copy of the last message

rm ~/THISMESSAGE

# a non-zero exit code will mess up maildrop, so exit with 0

exit 0

I used reformail, part of the Courier package, to add a line to the message header indicating whether this message exceeded thresholds. I could not pipe the standard output from dccproc through reformail and also test the exit code from dccproc. Therefore, the dccproc output is parked in a temporary file in the user's home directory while dccproc's exit code is tested. I test three checksums at a threshold of 10.


Modifications to Maildrop's .mailfilter

I placed this line near the beginning of Maildrop's .mailfilter, before any testing:

NOTSPAM="False"

I added these lines near the beginning of .mailfilter:

# 20020416 Comment out when DCC changes are confirmed OK
cc Maildir/.In.Backup

I created and then checked the Backup folder during testing. I will comment out this line I am satisfied all is working OK. I use the Backup folder whenever I change .mailfilter because Maildrop has the annoying policy of depositing mail into an mbox if the destination is not configured as a maildir and Courier ignores mboxes so clients can't get the mail.

I found that with the Backup folder loaded from .mailfilter I did not need the logging option of dccproc.

The bulk of my .mailfilter consists of tests examining headers for my incoming mailfolders.

I placed the following lines at the end of my folder filtering, just before spam checks and sending mail to default destinations:

# 20020511 Skip messages with known headers that have been tagged as spam incorrectly

# 20020521
if (/^From:.*borders.m0.net/ )
NOTSPAM="True"

For each wanted message identified by DCC as over threshold or identified by my other spam software, I add a test to set NOTSPAM using the full capabilities of maildrop. I use maildrop for this rather than relying on the DCC whitelist, which I was unable to configure successfully for this purpose.

I placed these lines after all the wanted message tests:

# 20020417 A Digital Checksum Clearinghouse server will respond
# with the number of times the fuzzy checksum of this message
# has been reported; if it exceeds my threshold, an X-DCC header
# will be added, which is filtered.
# 20020514 If a message has been tagged as not spam, don't run dcc check on it
if ( $NOTSPAM eq "False" )
xfilter "/usr/bin/dccproc-maildrop"

# 20020417
if (/^X-DCC-OVER-THRESHOLD: true/)
to "Maildir/.In.Spam.DCC"

I condition the call to the bash script against the state of NOTSPAM and call it only if NOTSPAM is still false. Then I test for the DCC flag the bash script sets and if it is found, I send the spam to the In/Spam/DCC filter to be checked later. I use a similar test of NOTSPAM to condition the call to my other spam filter.

In the last part of .mailfilter, I send messages to their default destinations.


Some results using DCC

I have three accounts on which I have configured DCC. Following are preliminary results.

ISP account: 1 message a month and tons of spam. Over five days, DCC caught 43 spam and missed 12, 78% caught.

55 messages received total
43 spam messages filtered by DCC as over threshold, no false positives
3 spam messages DCC false negatives but caught by maildropspam filter
9 spam messages DCC false negatives and maildropspam filter false negatives
0 useful messages

One user account: a few bulk mail messages a week, lots of spam. Over seven days, DCC caught 57 spam and missed 15, 79% caught.

76 messages received total
57 spam messages filtered by DCC as over threshold, no false positives
4 spam messages DCC false negatives but caught by maildropspam filter
11 spam messages DCC false negatives and maildropspam filter false negatives
4 useful messages

Domain account: 60 messages a day, little spam, lots of wanted bulk email. The wanted bulk email is sorted into folders before DCC is called. Over seven days, DCC caught 9 spam and missed 5, 64% caught.

420 messages received total (approximation)
9 spam messages filtered by DCC as over threshold
3 messages filtered by DCC as over threshold, false positives
3 spam messages DCC false negatives but caught by maildropspam filter
2 spam messages DCC false negatives and maildropspam filter false negatives
406 useful messages (approximation)

Overall results

Over 75% of spam was caught by DCC in the large samples and, after adding desired bulk mailers to the white list on the first occurance, there were no more false positives.


Other comments

As I add users, I will move the dccproc-maildrop bash script to each user's home directory and add a whitelist for each user. I'll include the system-wide whitelist in the user's whitelist. Each user will then be able to add her own desired bulk mail sources to her own whitelist.


Disclaimer

This account was written from my recollections aided by entries in my log. Use it at you own risk.

Please send Dick any comments or corrections through http://www.hodgman.org/contact/.

In particular, let me know if you can suggest a more elegant way of passing standard output from dccproc to reformail while testing the exit code of dccproc. Parking standard output in a file seems like a kludge to me.


References:

I used the following to help me wade through this process:

Freshmeat at http://freshmeat.net/ - the resource that helped me find this software

Courier at http://www.courier-mta.org/

Getmail at http://www.qcc.ca/~charlesc/software/getmail-2.0/

Maildrop Spam Filter at http://sourceforge.net/projects/mdropspamfilter/

Courier Searchable Email List Archive (includes a Maildrop list) at http://sourceforge.net/mail/?group_id=5404

DCC at http://www.rhyolite.com/anti-spam/dcc/

Google Groups (formerly Deja) at http://groups.google.com/ - a great source for hints from others who have tackled similar problems


Last modified on 2003 January 09

Valid XHTML 1.0! Valid CSS!