Using Deep Reinforcement Learning for Assessing the Consequences of Cyber Mitigation Techniques on Industrial Control Systems


  • Terry Merz SUNY-CEHC/PNNL
  • Romarie Morales Rosado Pacific Northwest National Laboratory



Deep reinforcement leaerning, APT, ICS, cyber mitigation, AI based automated attacks, ICS cyber-attacks, energy security, cybersecurity


This paper discusses an in-progress study involving the use of deep reinforcement learning (DRL) to mitigate the effects of an advanced cyber-attack against industrial control systems (ICS).  The research is a qualitative, exploratory study which emerged as a gap during the execution of two rapid prototyping studies.  During these studies, cyber defensive procedures, known as “Mitigation, were characterized as actions taken to minimize the impact of ongoing advanced cyber-attacks against an ICS while enabling primary operations to continue.  To execute Mitigation procedures, affected ICS components required rapid isolation and quarantining from “healthy” system segments. However today, with most attacks leveraging automation, mitigation also requires rapid decision-making capabilities operating at the speed of automation yet with human-like refinement.  The authors settled on the choice of DRL as a viable solution to this problem due to the algorithm’s designs which involves “intelligent” decisions based upon continuous learning achieved through a rewards system.  The primary theory of this study posits that processes informed by data sources relative to the execution path of an advanced cyber-attack as well as the consequences of deploying a particular Mitigation procedure evolve the system into an ever-improving defensive capability.  This study seeks to produce a defensive DLR based software agent trained by a DRL based offensive software agent that generates policy refinements based upon extrapolations from a corrupted network state as reported by an IDS and baseline data. Results include an estimation rule that would quantify impacts of various mitigation actions while protecting the operational critical path and isolating an in-progress attack.  This study is in a conceptual phase and development has not started.

This research questions for this study are:

RQ1: Can this software agent categorize correctly an in-progress cyber-attack and extrapolate the potential ICS assets affected?

RQ2: Can this software agent categorize novel cyber-attacks and extrapolate a probable attack vector while enumerating affected assets?

RQ3: Can this software agent characterize how operations are affected by quarantine actions?

RQ4: Can this software agent generate a set of ranked recommended courses of action by effectiveness, and least negative effects on the operational critical path?

Author Biography

Romarie Morales Rosado, Pacific Northwest National Laboratory

Dr. Romie Morales Rosado is a scientist in the Foundational Data Science group at Pacific Northwest National Laboratory. She specializes in applied quantitative modeling techniques, statistical inference, machine learning/deep learning, optimal control, and dynamic optimization tools.  She has PhD in applied mathematics, and an MS in applied mathematics from ASU, and a BA in political science and economics from Pontifical Catholic University of Puerto Rico.