spam code copied

Classification of e-mail into spam

Dimension

Description

The concept of unsolicited commercial e-mail, or “spam”, is diverse and includes such examples as advertisements for products or web sites, get rich quick schemes, chain letters, and pornography. This is a collection of spam and non-spam e-mails assembled by George Forman at Hewlett-Packard in June and July of 1999. Forman, together with a team of collaborators, also extracted 57 numeric features from the e-mails that could potentially be used to classify the e-mails.

Note that this is a personal collection, and thus some of the features are highly specific (e.g., the name “George”, the phone number 650-857-7835, etc.).

Outcome

Features

Prediction set

Reference

I obtained this data set from the UCI Machine Learning Repository. The data set was originally created by Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt at Hewlett-Packard Labs in Palo Alto, CA.