Jonathan A. Zdziarski

Subscribe to Jonathan A. Zdziarski: eMailAlertsEmail Alerts
Get Jonathan A. Zdziarski: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories by Jonathan A. Zdziarski

This article is an excerpt from Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. Printed with permission from No Starch Press. Copyright 2005. Unlike older spam filters, in which the author programs the characteristics of spam, statistical filtering automatically chooses the characteristics (or "features") of spam and nonspam directly from each e-mail. Two years from now, when spam has evolved in content, statistical filters will have learned enough to continue doing their job. This is because unlike older spam filters, in which the author programmed rules to identify spam, statistical filters automatically identify damning features of a spam based on message content. Tokenization is the process of reducing a message to its colloquial components. These components can be individual words, word pairs, or other small chunks o... (more)