MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs
LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt injection attacks can steer these systems toward clinically unsafe or misleading outputs. We introduce
When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins
Prompt injection attacks pose a critical threat to large language models (LLMs), with prior work focusing on cutting-edge LLM applications like personal copilots. In contrast, simpler LLM applications, such
Detecting Prompt Injection Attacks Against Application Using Classifiers
networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems
WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses typically rely on fine-tuning
A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy
emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs can exploit LLMs, causing data leaks, unauthorized actions, or compromised outputs
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack
automating complex workflows across various fields. However, these systems are highly vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external data can hijack agent behavior. In this
DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection
Prompt injection has emerged as a critical security threat to large language models (LLMs), yet existing studies predominantly focus on single-dimensional attack strategies, such as semantic rewriting or character
DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion
they process user data according to predefined instructions. However, conventional LLMs remain vulnerable to prompt injection, where malicious users inject directive tokens into the data to subvert model behavior. Existing
Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing
find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight
Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification
classification tasks such as sentiment analysis, yet their reliance on natural language prompts exposes them to prompt injection attacks. In particular, class-directive injections exploit knowledge of the model
WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents
Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks
ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
Prompt injection attacks aim to contaminate the input data of an LLM to mislead it into completing an attacker-chosen task instead of the intended task. In many applications
Defending Against Prompt Injection with DataFilter
agents are increasingly deployed to automate tasks and interact with untrusted external data, prompt injection emerges as a significant security threat. By injecting malicious instructions into the data that LLMs
Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification
designed inputs. Existing detection approaches often require centralizing prompt data, creating significant privacy risks. This paper proposes a privacy-preserving prompt injection detection framework based on federated learning and embedding
enabled, channel metadata (topic/description) can be incorporated into the model's system prompt. Prompt injection is a documented risk for LLM-driven systems. This issue increases the injection surface
PINA: Prompt Injection Attack against Navigation Agents
actions. Compared to text-based applications, their security is far more critical: a successful prompt injection attack does not just alter outputs but can directly misguide physical navigation, leading
Physical Prompt Injection Attacks on Large Vision-Language Models
reasoning in open physical environments. While LVLMs are known to be vulnerable to prompt injection attacks, existing methods either require access to input channels or depend on knowledge of user
Better Privilege Separation for Agents by Restricting Data Types
systems, such as AI agents. Unfortunately, these advantages have come with a vulnerability to prompt injections, an attack where an adversary subverts the LLM's intended functionality with an injected
SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents
effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat
AI Threat Alert