Social Engineering of AI Agents

Authors

  • Jukka Vuorinen University of Jyväskylä
  • Eeli Mäkinen

DOI:

https://doi.org/10.34190/eccws.25.1.4606

Keywords:

Machine-Targeted Social Engineering, Autonomous LLM Agents, Contextual Fabrication, Tool Metadata Manipulation, Cognitive Security

Abstract

Large Language Models (LLMs) are increasingly embedded within autonomous agents that plan, reason, and interact with external systems through tools such as APIs, databases, and web services. These tool integrations allow agents to overcome the static and outdated nature of LLM knowledge, granting them real-time access to dynamic information sources and operational capabilities as demonstrated in frameworks like ReAct and MRKL. However, this architectural shift also exposes a new and insufficiently understood attack surface. Prior cybersecurity threats targeting application interfaces—such as SQL injection—have relied on injecting structured malicious commands into well-defined syntactic channels. In contrast, LLM agents operate primarily through natural language: both internal planning and external tool selection are mediated linguistically rather than programmatically. This change has profound security implications. Agent behaviour relies heavily on informal language and semantic interpretation. Traditional attack detection fails because it requires rigid markers like command prefixes or specific character patterns. Adversaries exploit this vulnerability by subtly altering the context the agent considers trustworthy. Recent work on tool metadata manipulation demonstrates how adversaries can exploit linguistic cues, authority signals, and persuasive descriptions to influence which tools an agent selects for a task. By modifying tool descriptions—while the tool’s programmed functionality is opaque to the agent—attackers can induce the agent to route sensitive data or actions to malicious endpoints without any direct prompt injection, code execution, or user deception. It is argued that such attacks constitute a new form of machine-targeted social engineering. Traditionally, it is seen that social engineering exploits cognitive biases in humans as the “weakest link” in security. Here, the weakness emerges instead from the ambiguity, informality, and contextual nature of natural language reasoning inside autonomous agents. The agent can be persuaded into harmful behaviour. It is discussed how such threats can be categorized within emerging agent security frameworks such as OWASP and MAESTRO, and defensive strategies designed to safeguard tool-using LLM systems from intentional manipulation are outlined. The findings indicate that cognitive security must now extend beyond users to the autonomous systems increasingly acting on their behalf.

Author Biographies

Jukka Vuorinen, University of Jyväskylä

Jukka Vuorinen is a Senior Lecturer in Cybersecurity at the University of Jyväskylä, Finland. His research lies at the intersection of cybersecurity and information systems, focusing on the social, ethical, and ontological dimensions of digital technologies and security practices.

Eeli Mäkinen

Eeli Mäkinen is a Master’s student in Cybersecurity at the University of Jyväskylä, Finland. His research interests include artificial intelligence and cybersecurity, particularly the analysis of vulnerabilities and weaknesses in AI-based systems.

Downloads

Published

2026-06-15