DACA: Automated Attack Scenarios and Dataset Generation


  • Frank Korving Tallinn University of Technology
  • Risto Vaarandi Tallinn University of Technology




security dataset, testbed, DevOps, detection engineering


Computer networks and systems are under an ever-increasing risk of being attacked and abused. High-quality datasets can assist with in-depth analysis of attack scenarios, improve detection rules, and help educate analysts. However, existing solutions for creating such datasets suffer from a number of drawbacks. First, several solutions are not open source with publicly released implementations or are not vendor neutral. Second, some existing solutions neglect the complexity and variance of specific attack techniques when creating datasets or neglect certain attack types. Third, existing solutions are not fully automating the entire data collection pipeline. This paper presents and discusses the Dataset Creation and Acquisition Engine (DACA), a configurable dataset generation testbed, built around commonly used Infrastructure-as-Code (IaC) and DevOps tooling which can be used to create varied, reproducible datasets in a highly automated fashion. DACA acts as a versatile wrapper around existing virtualization technologies and can be used by blue as well as red teamers alike to run attack scenarios and generate datasets. These in turn can be used for tuning detection rules, for educational purposes or pushed into data processing pipelines for further analysis. To show DACA's effectiveness, DACA is used to create two extensive datasets examining covert DNS Tunnelling activity on which a detailed analysis is performed.