Frequent Itemset Mining for Model Data Reduction

Authors

  • Blake Johns University of South Alabama
  • Ryan Benton University of South Alabama

DOI:

https://doi.org/10.34190/iccws.21.1.4475

Keywords:

Itemset Mining, Network Intrusion Detection, Pattern Mining, Feature Selection, Data Reduction

Abstract

Network intrusion detection systems (NIDS) are in constant battle to come up with new techniques, whether it is a new machine learning model or viewing the data in a new form, to stay ahead of the curve. Recent work has shown that using Deep Learning, coupled with an image-based representation of packet-level data, achieve a high detection rate for a wide number of attacks. Key to the image construction was removing information that can cause misalignment or potentially bias in the Deep Learning model. However, while image-based Deep Learning is known to be good at extracting relevant information from images, it is proposed that using additional preprocessing can result in both improving training and detection time while potentially providing insight into bytes that are frequently occurring in attacks. To this end, this paper proposes the use of Frequent Itemset Mining (FIM), which is a powerful tool that allows the identification of frequent (commonly occurring) itemsets (bytes) from transactional data (bytes in a packet). By identifying the frequent bytes from within the packets of a flow, the feature space can be greatly reduced, which would permit a quicker classification time. This approach is tested on image representations of CICIDS-2017 data where the first n packets of a flow are transformed into an image and then fed into a Convolutional Neural Network (CNN) model architecture. Results show that the use of frequent itemsets to identify frequent byte positions of a packet yield comparable results to images that do not use FIM to reduce the number of packet bytes. The feature reduction results in the need for 99% less data, which reduces the training complexity. We also show better or comparable results to models using the non-reduced packets when reducing the number of packets in a flow (ranging from eight packets to five).

Author Biographies

Blake Johns, University of South Alabama

Current PhD student at the University of South Alabama and a full time Data Scientist.

Ryan Benton, University of South Alabama

University of South Alabama Professor and Chair of the Department of Computer Sciences.

Downloads

Published

19-02-2026