A Hybrid Machine Learning Approach for Red Team Log Analysis

Nickolas Mohr; Alan Shaffer; Gurminder Singh; Armon Barton

doi:10.34190/iccws.21.1.4433

Authors

Nickolas Mohr Naval Postgraduate School, Monterey, USA
Alan Shaffer Naval Postgraduate School, Monterey, USA
Gurminder Singh Naval Postgraduate School, Monterey, USA
Armon Barton Naval Postgraduate School, Monterey, USA

DOI:

https://doi.org/10.34190/iccws.21.1.4433

Keywords:

Red team, log analysis, machine learning, vulnerability assessment, large language model

Abstract

Red teaming is a common cybersecurity practice that simulates real-world adversarial cyber operations on defended systems to identify vulnerabilities. Current red team tools often have limited logging capabilities, resulting in insufficient analysis that prevents red teams from receiving real-time feedback and insights after operations. The application of machine learning for automating the analysis of red team operations is severely constrained by the scarcity of labeled, real-world log data. This research addresses this challenge by exploring the potential of using synthetic data to train attack-detection models for Cobalt Strike logs. We systematically evaluate three different training approaches for analyzing Cobalt Strike operational logs: synthetic-only, real-world-only, and a hybrid approach that combines both data types. Our methodology employs a comprehensive feature engineering pipeline that includes both programmatic log generation for creating large-scale structured data and large language model techniques for introducing variety and edge cases. We transform each log file into a high-dimensional vector that includes event types, command verbs, temporal activity patterns, and mappings to the MITRE ATT&CK knowledge base. Random Forest classification models are trained using this feature set to distinguish between successful and failed attack scenarios. By rigorously testing each training approach against a manually labeled ground-truth set of 112 authentic Cobalt Strike logs, we quantify the performance and limitations of each strategy. Our main contribution is demonstrating that a hybrid training strategy achieves 94% accuracy, greatly surpassing synthetic-only models (56%) and real-world-only models (79%). This combined approach effectively addresses both the domain gap in synthetic data and the data scarcity in small, real-world datasets. The hybrid model learns attack diversity from over 30,000 synthetic scenarios while grounding understanding in the authentic structural patterns of real logs, providing a 15 percentage-point improvement over real-data-only approaches. This research offers a practical framework for enhancing limited real-world cybersecurity datasets by strategically integrating synthetic data, enabling immediate use in Department of Defense red team operations and wider cybersecurity machine learning applications.

Author Biographies

Nickolas Mohr, Naval Postgraduate School, Monterey, USA

Nick Mohr is a Major in the United States Marine Corps. He earned his bachelor’s degree from the University of California, Berkeley in 2012 and was commissioned shortly thereafter. He later completed a master’s degree in computer science at the Naval Postgraduate School. His research focuses on cybersecurity, red team operations, and automated log analysis.

Alan Shaffer, Naval Postgraduate School, Monterey, USA

Dr Alan Shaffer is a Senior Lecturer for the Information Sciences and Computer Science Departments at the Naval Postgraduate School (NPS) in Monterey, CA. He earned his PhD in Computer Science at NPS in 2008, and was subsequently appointed an Assistant Professor on the CS faculty while still an active duty naval officer. His research interests include cyber operations and security, computer systems modelling and verification, and data science. Dr. Shaffer has been involved with the IEEE Symposium on Security and Privacy as an Organizing Committee Treasurer.

Gurminder Singh, Naval Postgraduate School, Monterey, USA

Dr Gurminder Singh is the Department Chair and a Professor of Computer Science at the Naval Postgraduate School (NPS), CA. His areas of focus include device support for mobility, wireless and handheld devices, and cyber physical systems. Prior to NPS, he was the President and CEO of NewsTakes, Inc., a company specializing in repurposing of multimedia content for delivery to wireless networks and devices. Prior to NewsTakes, Dr. Singh was Director at the Kent Ridge Digital Lab in Singapore (now I²R) where his responsibilities included strategic directions for research, management of research staff, and commercialization of intellectual property. He led an R&D lab focused on a variety of research themes including video and audio processing, PDAs for students, and online learning communities for children. His lab contributed significantly to the formation of the MPEG-7 standard. As a senior researcher in the lab, he was deeply involved with research in networked media, user interface software, and networked virtual worlds.

Armon Barton, Naval Postgraduate School, Monterey, USA

Dr Armon Barton is an Assistant Professor in the Department of Computer Science at the Naval Postgraduate School. His core research interests lie at the intersection of machine learning and computer security and privacy with focus areas in secure machine learning, wide area surveillance, anonymous communications, and traffic analysis. Current projects include defending neural networks against adversarial examples, developing novel adversarial example attacks, object detection in wide area motion imagery, video fingerprinting attacks and defenses over Tor, and detecting malicious network traffic over enterprise-level network architectures.

A Hybrid Machine Learning Approach for Red Team Log Analysis

Authors

DOI:

Keywords:

Abstract

Author Biographies

Nickolas Mohr, Naval Postgraduate School, Monterey, USA

Alan Shaffer, Naval Postgraduate School, Monterey, USA

Gurminder Singh, Naval Postgraduate School, Monterey, USA

Armon Barton, Naval Postgraduate School, Monterey, USA

Downloads

Published

Issue

Section

License

Current Issue

Information