How Bayesian Analysis Works

How Bayesian Analysis Works

Previous Top Next

The Bayesian Analyzer uses Naive Bayesian statistics to calculate the probability that an unknown e-mail is spam or not. It uses the information from your previous e-mails to make its determination.

Every e-mail is broken down into words. For every word, the analyzer figures out the probability of a message being spam if that word appears in the text of the e-mail. This information is built during training. In order for the Bayesian Analyzer to figure out these probabilities, it must know in advance whether an e-mail is spam or good. Therefore, it is critical, that you correct any mistakes that Spam Sleuth may have made before training. If you don't correct the mistakes, the Bayesian Analysis will reinforce the mistakes.

Once the Bayesian Analyzer has figured out the probabilities for the words, it stores them in a dictionary file. If you want to see the word probabilities, you can Export the file in comma separated format.

When a new e-mail comes in, it is broken down into words, and the 15 most influential words are used to calculate a probability that the message is spam using formulas established by Thomas Bayes. The most influential words are those that have probabilities near 0 (absolutely a good e-mail) or near 1 (absolutely a spam e-mail). If you would like Spam Sleuth to use more or fewer words in its calculation, you can change it in the Advanced settings.

The end result from the Bayesian Analyzer is a probability that the e-mail is spam. This is converted into points using an logarithmic algorithm which adds or subtracts many points when the Bayesian Analyzer is certain of its decision. The Bayesian Analyzer adds or subtracts only a few points, or none at all, when it is not very sure whether an e-mail is good or spam.