Evaluating the Effectiveness of Psychological Prompt Injection Attacks on Large Language Models for Social Engineering Artifact Generation

Thomas Heverin; Eve Cohen

doi:10.34190/eccws.24.1.3515

Authors

Thomas Heverin Girls Learn Cyber LLC
Eve Cohen Germantown Academy

DOI:

https://doi.org/10.34190/eccws.24.1.3515

Keywords:

Prompt injections, Psychological techniques, Social engineering, AI vulnerabilities

Abstract

This study explores the vulnerability of Large Language Models (LLMs) to prompt injection attacks, a critical security concern. We investigate the effectiveness of four psychological techniques (PTs) from social engineering – Impersonation, Incentive, Persuasion, and Quid Pro Quo – in facilitating these attacks. Prompt injection involves manipulating LLMs by embedding malicious instructions within user prompts, potentially generating harmful content or compromising sensitive data. Understanding these mechanisms is crucial for developing effective defenses. Our research assesses how these PTs influence prompt injection success rates against ChatGPT-4o mini and Gemma-7b-it LLMs used for ChatGPT and Gemini respectively. We hypothesized that PTs significantly increase the likelihood of successful attacks, with some techniques being more effective. 220 prompt injection tests (110 per LLM) were conducted, designed to elicit social-engineering artifacts like phishing emails, fake login screens, and ransomware notes, evaluating model susceptibility to diverse attack vectors. The four PTs were chosen based on their relevance to manipulating human behavior in social engineering. Impersonation involves assuming a trusted identity, Incentive offers rewards, Persuasion uses manipulative tactics, and Quid Pro Quo involves reciprocal exchanges. These techniques were adapted for prompt injections to simulate real-world social engineering scenarios. Statistical methods, including ANOVA and Kruskal-Wallis tests, assessed the overall impact of PTs. Mann-Whitney U tests with Bonferroni correction compared individual techniques, and Cohen’s d measured effect sizes. Results demonstrate a statistically significant impact of PTs on prompt injection success. Impersonation was most effective across both LLMs, followed by Persuasion and Quid Pro Quo, with Incentive being least effective. These findings align with social engineering principles, highlighting the power of impersonation and other manipulative tactics. Our research has significant implications for LLM security and AI-driven social engineering. LLM vulnerability to psychologically-driven prompt injections necessitates proactive security measures. Future research should focus on robust defense mechanisms, explore the interplay of PTs, and investigate their impact on LLM security. This study contributes to understanding LLM vulnerabilities and developing more resilient AI systems.

Author Biographies

Thomas Heverin, Girls Learn Cyber LLC

Dr. Thomas Heverin is the lead of Girls Learn Cyber, Inc., a cybersecurity and AI research incubator for high school girls. He holds a Ph.D. focused on cybersecurity, the CISSP, a cybersecurity patent with the U.S. Navy, and several years of cybersecurity experience.

Eve Cohen, Germantown Academy

Eve is an Academy Scholar at Germantown Academy with research spanning AI prompt injections and data modeling in cybersecurity. She has authored Google Hacks for the GHDB, co-authored an exploit in ExploitDB, co-authored a CVE in the NVD, and submitted numerous bug bounty reports focused on real-world system vulnerabilities.

Evaluating the Effectiveness of Psychological Prompt Injection Attacks on Large Language Models for Social Engineering Artifact Generation

Authors

DOI:

Keywords:

Abstract

Author Biographies

Thomas Heverin, Girls Learn Cyber LLC

Eve Cohen, Germantown Academy

Downloads

Published

Issue

Section

License

Current Issue

Information