Auto-learning of SMTP TCP Transport-Layer Features for Spam and Abusive Message Detection

Georgios Kakavelakis, Robert Beverly, and Joel Young
Proceedings of the 25th USENIX Large Installation Systems Administration Conference (LISA 2011),
Boston, MA, December 2011.

Botnets are a significant source of abusive messaging (spam, phishing, etc) and other types of malicious traffic. A promising approach to help mitigate botnet-generated traffic is signal analysis of transport-layer (\ie TCP/IP) characteristics, \eg timing, packet reordering, congestion, and flow-control. Prior work~\cite{spamflow-ceas08} shows that machine learning analysis of such traffic features on an SMTP MTA can accurately differentiate between botnet and legitimate sources. We make two contributions toward the \emph{real-world} deployment of such techniques: i) an architecture for real-time on-line operation; and ii) auto-learning of the unsupervised model across different environments without human labeling (\ie training). We present a ``SpamFlow'' SpamAssassin plugin and the requisite auxiliary daemons to integrate transport-layer signal analysis with a popular open-source spam filter. Using our system, we detail results from a production deployment where our auto-learning technique achieves better than $95$ percent accuracy, precision, and recall after reception of $\approx$ 1,000 emails.

[PDF(388KB)] [BibTeX]
[Presentation Slides(1252KB)]

[ Return to publications ]