spamtrack
NAME
spamtrack - a nifty script to filter incoming email with spamprobe
SYNOPSIS
spamtrack [ -help | mailboxfile ]
DESCRIPTION
If you have read about Bayesian analysis spam analyzers at
http://www.paulgraham.com/spam.html
you will be happy to know that such software is installed on
the Newton Lab workstations in the form of spamprobe. (Be
sure to read the man page for spamprobe.) Unfortunately,
while spamprobe can be used at the Unix command line on exist-
ing files (mailboxes, saved email), it doesn't act on incoming
email. For that, you can use spamtrack by putting the command
"|/usr/local/bin/spamtrack"
into your file $HOME/.forward. (Do not forget to include the
quotation marks!)
Without parameters, the spamtrack script catches incoming
email, scores it according to your personal spamprobe data-
base, and then deals with it accordingly. During its first
use, spamtrack will generate the file $HOME/.spamtrackrc
with contents something like this:
SPAMBOX=/home/systems/jdoe/mail/spam
FALSENEG=/home/systems/jdoe/mail/not_good
FALSEPOS=/home/systems/jdoe/mail/not_spam
NOTMAIL=/home/systems/jdoe/mail/not_mail
ACCEPT=adougher@
ACCEPT=dougherty@
ACCEPT=siam.org
REJECT=sleazy.org
SPAMLEVEL=0.9 # 0.9 is usually good
SPAMPROBE_OPTS= # -h, -H, -r, -s...
ACTION=tag_only # tag_only or divert
You can edit this file if you want to change the default val-
ues for your "spambox" and the other files (mailboxes) which
will be used by spamtrack. (Don't use the symbol `~' to
designate your home directory.) In particular you can change
tag_only to divert if you want putative spam to be diverted
directly to your "spambox" instead of being passed along to
your inbox.
Pine and mailx users can define ACTION as "divert" to have
putative spam go straight to the SPAMBOX. However, Eudora
users must let ACTION be "tag_only" so that email goes to
the normal inbox after being tagged with the X-Spam-Status
headers; it is after Eudora downloads the new mail from the
server (babbage) that you can sort the spam and/or dump it
into your local spam folder.
IMPORTANT: Spamprobe requires some training time while it
builds a database which accurately rates the spamminess of
words/phrases. Particularly during the first weeks, be sure
to save messages which have been erroneously categorized and
add them to your appropriate not_spam or not_good file (as
defined in your .spamtrackrc file). It might be wise to use
ACTION=tag_only for a few weeks before changing it to divert
so that you don't forget to check for false positives.
False positives are the worst kind of error committed by any
spam filter. These are innocent emails unjustly judged as
spam, and it will happen occasionally -- and you must take
action to correct spamprobe so that future false positives
are minimized. Save them to a mailbox named "not_spam" or
whatever other filename you designate (FALSEPOS) in your
$HOME/.spamtrackrc file.
Likewise, save false negatives (spam that sneaks in
undetected) to "not_good". After a while, spamprobe will
get very accurate at rating the spamminess of new email the
same way you would. NOTE: your criteria for spamminess will
diverge from that of other users according to the email you
receive and the way you train spamprobe by use of the
not_spam and not_good files, because you are building your
own word/phrase database in directory ~/.spamprobe.
The lines "ACCEPT=..." and "REJECT=..." are an added
feature that lets you bypass the spamprobe score of an
incoming message and accept/reject it merely on the basis of
its "From" address. Don't rely on this feature too
heavily; the address in the From line is often spoofed, and
besides, this violates the underlying mechanism of pure
Bayesian Analysis! It may be more worthwhile in the long
run to "teach" spamprobe than to override it this way.
A NICE TRICK: If you want to know why a particular email has
been scored in a certain way by spamprobe, save that message
to a file, say, msg88.tmp, and then run this command on it:
spamprobe -T score msg88.tmp
This will show the most significant words/phrases used to score
the message, and perhaps give you clues for fixing the database.
Read the man page for spamprobe for more techniques.
WITH PARAMETERS
spamtrack -help
Use this option at the command line to get a brief
statement about spamtrack and the files it uses.
spamtrack mailboxfile
A mailbox is typically a file containing one or more
concatenated email messages with all their headers. If
you invoke spamtrack at the command line with the name
of a mailbox file, it evaluates all the messages in the
mailbox and moves the "spam" to your SPAMBOX, leaving
only "good" messages in the mailbox. This could take a
while if the mailbox is large, so it should probably
not be run on your inbox -- it might interfere with new
incoming messages.
FILES
$HOME/.spamtrackrc
Contains your customization information for spamtrack.
Each line is of the form VAR=value;
VAR default value
SPAMBOX $HOME/mail/spam
FALSENEG $HOME/mail/not_good
FALSEPOS $HOME/mail/not_spam
NOTMAIL $HOME/mail/not_mail
SPAMLEVEL 0.9
SPAMPROBE_OPTS
ACTION tag_only
SPAMLEVEL is the minimum spamprobe score which will make
spamtrack treat a message as spam. "ACTION" can be either
tag_only or divert, and informs spamtrack of additional
commandline options to use when running spamprobe.
In addition, you can add ACCEPT= and REJECT= lines (as per
sample .spamtrackrc contents shown earlier) if you want
spamtrack to override spamprobe based on the "From" address.
On any line in this file, everything after # is ignored
as a comment.
$HOME/.spamprobe/sp_words
This is your own spamprobe database file. (If you are
just starting to use spamprobe and want to start with a
"pre-educated" database file you can replace this file
with /usr/local/spamtrack/sp_words.)
$HOME/mail/spam
This is the default value of SPAMBOX. If you choose
ACTION=divert rather than merely tagging spam, incoming
spam will go directly to this file rather than to your
inbox.
$HOME/mail/not_spam and $HOME/mail/not_good
These files (default values of FALSEPOS and FALSENEG)
are the mailboxes where you should put messages which
have been judged incorrectly. Every time spamtrack
runs, it checks these files and corrects your spamprobe
database accordingly, and then empties both files.
$HOME/mail/not_mail
This file is where non-email STDIN goes. If a user
mistakenly invokes spamtrack on the commandline without
any parameters, or pipes the output of another command
to spamtrack, it goes here unless the input contains
actual email headers.
/usr/local/BerkeleyDB.4.1/bin
Berkeley DB utilities; add this directory to your PATH
/usr/local/BerkeleyDB.4.1/lib
Location of Berkeley DB libraries libdb-4.1.a, libdb-4.1.so;
add this directory to your LD_LIBRARY_PATH
WARRANTY
Spamtrack relies heavily on spamprobe, so some undesirable
"features" of spamtrack may actually be those of spamprobe.
However, neither of the two programs gives any warranty at
all; YOU ASSUME ALL RISK when using this software.
AUTHOR
Bruce Fast