spamtrack

NAME
   spamtrack - a nifty script to filter incoming email with spamprobe

SYNOPSIS
     spamtrack [ -help  |  mailboxfile ]

DESCRIPTION
     If you have read about Bayesian analysis spam analyzers at

          http://www.paulgraham.com/spam.html

     you will be happy to know  that such software  is installed on
     the  Newton  Lab workstations  in the form of spamprobe.   (Be
     sure to read  the man page  for  spamprobe.)    Unfortunately,
     while spamprobe can be used at the Unix command line on exist-
     ing files (mailboxes, saved email), it doesn't act on incoming
     email.  For that, you can use spamtrack by putting the command

              "|/usr/local/bin/spamtrack"

     into your file $HOME/.forward.   (Do not forget to include the
     quotation marks!)

     Without parameters, the spamtrack  script  catches  incoming
     email,  scores it according to your personal spamprobe data-
     base, and then deals with it accordingly.   During its first
     use, spamtrack will generate the file $HOME/.spamtrackrc
     with contents something like this:

          SPAMBOX=/home/systems/jdoe/mail/spam
          FALSENEG=/home/systems/jdoe/mail/not_good
          FALSEPOS=/home/systems/jdoe/mail/not_spam
          NOTMAIL=/home/systems/jdoe/mail/not_mail
          ACCEPT=adougher@
          ACCEPT=dougherty@
          ACCEPT=siam.org
          REJECT=sleazy.org
          SPAMLEVEL=0.9       # 0.9 is usually good
          SPAMPROBE_OPTS=     # -h, -H, -r, -s...
          ACTION=tag_only     # tag_only or divert

     You can edit this file if you want to change the default val-
     ues for your "spambox" and the other files (mailboxes) which
     will be used by spamtrack.    (Don't use the symbol `~' to
     designate your home directory.)  In particular you can change
     tag_only  to divert  if you want putative spam to be diverted
     directly to your "spambox"  instead of being passed along to
     your inbox.

     Pine and mailx users can define  ACTION as "divert"  to have
     putative spam go straight to the SPAMBOX.    However, Eudora
     users  must let  ACTION be "tag_only"  so that email goes to
     the normal inbox  after being tagged with  the X-Spam-Status
     headers;  it is after Eudora downloads the new mail from the
     server (babbage) that you  can sort the spam  and/or dump it
     into your local spam folder.

     IMPORTANT:  Spamprobe requires some training time  while  it
     builds  a  database which accurately rates the spamminess of
     words/phrases.  Particularly during the first weeks, be sure
     to save messages which have been erroneously categorized and
     add them to your appropriate  not_spam or  not_good file (as
     defined in your .spamtrackrc file).  It might be wise to use
     ACTION=tag_only for a few weeks before changing it to divert
     so that you don't forget to check for false positives.

     False positives are the worst kind of error committed by any
     spam  filter.  These  are innocent emails unjustly judged as
     spam, and it will happen occasionally -- and you  must  take
     action  to  correct spamprobe so that future false positives
     are minimized.  Save them to a mailbox named  "not_spam"  or
     whatever  other  filename  you  designate (FALSEPOS) in your
     $HOME/.spamtrackrc file.

     Likewise,  save  false  negatives  (spam  that   sneaks   in
     undetected)  to  "not_good".   After a while, spamprobe will
     get very accurate at rating the spamminess of new email  the
     same way you would.  NOTE: your criteria for spamminess will
     diverge from that of other users according to the email  you
     receive  and  the  way  you  train  spamprobe  by use of the
     not_spam and not_good files,  because  you are building your
     own word/phrase database in directory ~/.spamprobe.

     The  lines  "ACCEPT=..."  and  "REJECT=..."   are  an  added
     feature  that  lets  you  bypass  the  spamprobe score of an
     incoming message and accept/reject it merely on the basis of
     its  "From"  address.    Don't  rely  on  this  feature  too
     heavily; the address in the From line is often spoofed,  and
     besides,  this  violates  the  underlying  mechanism of pure
     Bayesian Analysis!  It may be more worthwhile  in  the  long
     run to "teach" spamprobe than to override it this way.

     A NICE TRICK: If you want to know why a particular email has
     been scored in a certain way by spamprobe, save that message
     to a file, say, msg88.tmp, and then run this command on it:

          spamprobe  -T  score  msg88.tmp

     This will show the most significant words/phrases used to score
     the message, and perhaps give you clues for fixing the database.
     Read the man page for spamprobe for more techniques.

WITH PARAMETERS

     spamtrack -help
          Use this option at the command  line  to  get  a  brief
          statement about spamtrack and the files it uses.

     spamtrack mailboxfile
          A mailbox is typically a file containing  one  or  more
          concatenated email messages with all their headers.  If
          you invoke spamtrack at the command line with the  name
          of a mailbox file, it evaluates all the messages in the
          mailbox and moves the "spam" to your  SPAMBOX,  leaving
          only "good" messages in the mailbox.  This could take a
          while if the mailbox is large, so  it  should  probably
          not be run on your inbox -- it might interfere with new
          incoming messages.

FILES

     $HOME/.spamtrackrc
            Contains your customization information for  spamtrack.
            Each line is of the form VAR=value;

              VAR                default value

            SPAMBOX             $HOME/mail/spam
            FALSENEG            $HOME/mail/not_good
            FALSEPOS            $HOME/mail/not_spam
            NOTMAIL             $HOME/mail/not_mail
            SPAMLEVEL           0.9
            SPAMPROBE_OPTS
            ACTION              tag_only

        SPAMLEVEL is the minimum spamprobe score which will make
        spamtrack treat a message as spam.  "ACTION" can be either
        tag_only or divert, and informs spamtrack of additional
        commandline options to use when running spamprobe.

        In addition, you can add ACCEPT= and REJECT= lines (as per
        sample  .spamtrackrc  contents shown earlier)   if you want
        spamtrack to override spamprobe based on the "From" address.

        On any line in this file, everything after # is ignored
        as a comment.

     $HOME/.spamprobe/sp_words
          This is your own spamprobe database file.  (If you  are
          just starting to use spamprobe and want to start with a
          "pre-educated" database file you can replace this  file
          with /usr/local/spamtrack/sp_words.)

     $HOME/mail/spam
          This is the default value of SPAMBOX.   If  you  choose
          ACTION=divert rather than merely tagging spam, incoming
          spam will go directly to this file rather than to  your
          inbox.

     $HOME/mail/not_spam    and   $HOME/mail/not_good
          These files (default values of FALSEPOS  and  FALSENEG)
          are  the  mailboxes where you should put messages which
          have been judged  incorrectly.   Every  time  spamtrack
          runs, it checks these files and corrects your spamprobe
          database accordingly, and then empties both files.

     $HOME/mail/not_mail
          This file is where non-email STDIN  goes.   If  a  user
          mistakenly invokes spamtrack on the commandline without
          any parameters, or pipes the output of another  command
          to  spamtrack,  it  goes here unless the input contains
	  actual email headers.

     /usr/local/BerkeleyDB.4.1/bin
	  Berkeley DB utilities; add this directory to your PATH

     /usr/local/BerkeleyDB.4.1/lib
	  Location of Berkeley DB libraries libdb-4.1.a, libdb-4.1.so;
	  add this directory to your LD_LIBRARY_PATH

WARRANTY
     Spamtrack relies heavily on spamprobe, so  some  undesirable
     "features"  of spamtrack may actually be those of spamprobe.
     However, neither of the two programs gives any  warranty  at
     all; YOU ASSUME ALL RISK when using this software.

AUTHOR
     Bruce Fast