Frequent Itemset Mining for Model Data Reduction
DOI:
https://doi.org/10.34190/iccws.21.1.4475Keywords:
Itemset Mining, Network Intrusion Detection, Pattern Mining, Feature Selection, Data ReductionAbstract
Network intrusion detection systems (NIDS) are in constant battle to come up with new techniques, whether it is a new machine learning model or viewing the data in a new form, to stay ahead of the curve. Recent work has shown that using Deep Learning, coupled with an image-based representation of packet-level data, achieve a high detection rate for a wide number of attacks. Key to the image construction was removing information that can cause misalignment or potentially bias in the Deep Learning model. However, while image-based Deep Learning is known to be good at extracting relevant information from images, it is proposed that using additional preprocessing can result in both improving training and detection time while potentially providing insight into bytes that are frequently occurring in attacks. To this end, this paper proposes the use of Frequent Itemset Mining (FIM), which is a powerful tool that allows the identification of frequent (commonly occurring) itemsets (bytes) from transactional data (bytes in a packet). By identifying the frequent bytes from within the packets of a flow, the feature space can be greatly reduced, which would permit a quicker classification time. This approach is tested on image representations of CICIDS-2017 data where the first n packets of a flow are transformed into an image and then fed into a Convolutional Neural Network (CNN) model architecture. Results show that the use of frequent itemsets to identify frequent byte positions of a packet yield comparable results to images that do not use FIM to reduce the number of packet bytes. The feature reduction results in the need for 99% less data, which reduces the training complexity. We also show better or comparable results to models using the non-reduced packets when reducing the number of packets in a flow (ranging from eight packets to five).
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Blake Johns, Ryan Benton

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.