Spam Email Detection Using Machine Learning Techniques

Authors

  • Mr Hellenic Air Force Academy, Dekeleia Air Force Base, Attica 13671, Greece
  • Antonios Andreatos Hellenic Air Force Academy
  • Prof Department of Mathematics, National Technical University of Athens, Politechneioupoli, Iroon Polytechneiou 9, Zografou 15772, Athens, Greece

DOI:

https://doi.org/10.34190/eccws.22.1.1208

Keywords:

Spam, Anti-spam filters, Machine Learning, Deep Learning, Blacklists, Geolocation, Classification

Abstract

This paper focuses on the security of electronic mail, using machine learning algorithms. Spam email is unwanted messages, usually commercial, sent to a large number of recipients. In this work, an algorithm for the detection of spam messages with the aid of machine learning methods is proposed. The algorithm accepts as input text email messages grouped as benevolent (“ham”) and malevolent (spam) and produces a text file in csv format. This file then is used to train a bunch of ten Machine Learning techniques to classify incoming emails into ham or spam. The following Machine Learning techniques have been tested: Support Vector Machines, k-Nearest Neighbour, Naïve Bayes, Neural Networks, Recurrent Neural Networks, Ada Boost, Random Forest, Gradient Boosting, Logistic Regression and Decision Trees. Testing was performed using two popular datasets, as well as a publicly available csv file. Our algorithm is written in Python and produces satisfactory results in terms of accuracy, compared to state-of-the-art implementations. In addition, the proposed system generates three output files: a csv file with the spam email IP addresses (of originating email servers), a map with their geolocation, as well as a csv file with statistics about the countries of origin. These files can be used to update existing organisational filters and blacklists used in other spam filters.

Downloads

Published

2023-06-19