Sleeper Code in Protein Data Files as Cyber Adversarial Vectors

Authors

  • Tia Pope North Carolina A&T State University

DOI:

https://doi.org/10.34190/iccws.21.1.4459

Keywords:

Cyberbiosecurity, data formats, data integrity, adversarial data, FASTA, mmCIF, AI, protein models

Abstract

Scientific protein data formats are widely assumed to be inert, yet their use in automated and AI-driven research environments creates overlooked pathways for cyberbiosecurity risk. This work examines sleeper code patterns, defined as structured non-executable strings embedded in FASTA files, CIF metadata, and protein sequences that persist through downstream processing. A controlled simulation framework models ten representative adversarial use cases and reveals that many workflow components carry these patterns forward without removal, including cloud alignment tools, high-performance computing pipelines, visualization utilities, and transformer-based protein models. Across the ten simulations, nine workflows preserved at least one embedded pattern, which confirms broad systemic tolerance for structured symbolic content. Results show that permissive parsing rules and AI prefix conditioning allow symbolic content to survive reformatting and, in some cases, to become further embedded within generated outputs. These findings indicate a structural blind spot in scientific workflows where biological trust assumptions obscure computational vulnerabilities. To address this gap, the paper introduces a multilayer mitigation framework that combines input sanitation, anomaly detection, AI model guardrails, workflow provenance, and federated containment. Taken together, the study reframes protein data formats as potential cyber vectors and highlights the need for interdisciplinary approaches that strengthen digital resilience across computational biology and national research infrastructure.

Author Biography

Tia Pope, North Carolina A&T State University

Tia is a 4th-year PhD candidate focused on AI-driven protein design across academic and industry collaborations.

Downloads

Published

19-02-2026