Automatically Attacking Software Reverse Engineering AI Agents

Brian Crawford; Justin Phillips; Patrick McClure

doi:10.34190/eccws.25.1.4597

Authors

Brian Crawford Naval Postgraduate School
Justin Phillips Naval Postgraduate School
Patrick McClure Naval Postgraduate School

DOI:

https://doi.org/10.34190/eccws.25.1.4597

Keywords:

prompt injection, software reverse engineering, large language models (LLMs), AI Agents

Abstract

Software tools for reverse engineering executable binary files, such as Ghidra, enable malware analysts to safely conduct robust static analysis without having access to original source code. Coupled with the analytic power of large language models (LLM), agentic systems enabled with tools, such as GhidraMCP, can allow analysts to automate a previously human driven process. Although this automation can increase the productivity of a single malware analyst, it also introduces a new area of vulnerability for malware obfuscation. This paper presents an adversarial technique using genetic algorithm-based prompt generation, a modification of an adversarial attack known as AutoDAN, to demonstrate the ability to deceive LLM-powered disassembly and decompilation systems into misinterpreting binary executables, effectively corrupting their analytical output. This proof-of-concept methodology exploits inherent vulnerabilities in how LLMs process and interpret decompiled machine code via prompt injection by using extraneous string variable assignments to pass surreptitious instructions to the LLM while not impacting the functionality of the executable file. We demonstrate this capability through several concise examples. This approach could enable attackers to bypass automated detection systems that rely on LLM-driven analysis pipelines. By studying and understanding this attack, insights can be gained regarding the security implication of integrating LLMs into cybersecurity toolchains and building more robust agentic code analysis systems.

Author Biographies

Brian Crawford, Naval Postgraduate School

Brian Crawford earned his bachelor’s degree in physics from King College in Bristol, TN in 2004. He received master’s degrees in education and computer science from the University of Lipscomb in 2009 and the Naval Postgraduate School in 2016 respectively. He is currently a Ph.D. candidate in computer science at the Naval Postgraduate School.

Justin Phillips, Naval Postgraduate School

Justin Phillips received his B.S. in Computer Networks and Cybersecurity from the University of Maryland Global Campus in 2023 and is currently pursuing an M.S. in Applied Cyber Operations at the Naval Postgraduate School. His capstone research focuses on strategic cyber defense modernization through AI-driven agentic reverse engineering and enterprise risk governance.

Patrick McClure, Naval Postgraduate School

Patrick McClure is an Assistant Professor of Computer Science at the Naval Postgraduate School. Previously, he was on the NIMH Machine Learning Team. He received a PhD in Neuroscience from the University of Cambridge and an M.S. in Computer Science and a B.E. in Bioengineering from the University of Louisville.

Automatically Attacking Software Reverse Engineering AI Agents

Authors

DOI:

Keywords:

Abstract

Author Biographies

Brian Crawford, Naval Postgraduate School

Justin Phillips, Naval Postgraduate School

Patrick McClure, Naval Postgraduate School

Downloads

Published

Issue

Section

License

Current Issue

Information