Paper 2602.06268v1

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs

LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt injection attacks can steer these systems toward clinically unsafe or misleading outputs. We introduce

high relevance benchmark
Paper 2511.05797v1

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Prompt injection attacks pose a critical threat to large language models (LLMs), with prior work focusing on cutting-edge LLM applications like personal copilots. In contrast, simpler LLM applications, such

high relevance attack
Paper 2512.12583v1

Detecting Prompt Injection Attacks Against Application Using Classifiers

networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems

high relevance attack
Paper 2510.01354v1

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically

high relevance benchmark
Paper 2509.24967v4

SecInfer: Preventing Prompt Injection via Inference-time Scaling

Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses typically rely on fine-tuning

high relevance attack
Paper 2601.22240v1

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs can exploit LLMs, causing data leaks, unauthorized actions, or compromised outputs

high relevance survey
Paper 2601.10173v1

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

automating complex workflows across various fields. However, these systems are highly vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external data can hijack agent behavior. In this

high relevance attack
Paper 2604.12548v1

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

Prompt injection has emerged as a critical security threat to large language models (LLMs), yet existing studies predominantly focus on single-dimensional attack strategies, such as semantic rewriting or character

high relevance attack
Paper 2511.00447v2

DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

they process user data according to predefined instructions. However, conventional LLMs remain vulnerable to prompt injection, where malicious users inject directive tokens into the data to subvert model behavior. Existing

high relevance attack
Paper 2512.23684v1

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight

high relevance survey
Paper 2511.21752v2

Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification

classification tasks such as sentiment analysis, yet their reliance on natural language prompts exposes them to prompt injection attacks. In particular, class-directive injections exploit knowledge of the model

high relevance attack
Paper 2602.03792v1

WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks

high relevance attack
Paper 2512.09321v3

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Prompt injection attacks aim to contaminate the input data of an LLM to mislead it into completing an attacker-chosen task instead of the intended task. In many applications

high relevance attack
Paper 2510.19207v2

Defending Against Prompt Injection with DataFilter

agents are increasingly deployed to automate tasks and interact with untrusted external data, prompt injection emerges as a significant security threat. By injecting malicious instructions into the data that LLMs

high relevance attack
Paper 2511.12295v1

Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification

designed inputs. Existing detection approaches often require centralizing prompt data, creating significant privacy risks. This paper proposes a privacy-preserving prompt injection detection framework based on federated learning and embedding

high relevance attack

enabled, channel metadata (topic/description) can be incorporated into the model's system prompt. Prompt injection is a documented risk for LLM-driven systems. This issue increases the injection surface

CVSS 3.7 openclaw View details
Paper 2601.13612v1

PINA: Prompt Injection Attack against Navigation Agents

actions. Compared to text-based applications, their security is far more critical: a successful prompt injection attack does not just alter outputs but can directly misguide physical navigation, leading

high relevance attack
Paper 2601.17383v1

Physical Prompt Injection Attacks on Large Vision-Language Models

reasoning in open physical environments. While LVLMs are known to be vulnerable to prompt injection attacks, existing methods either require access to input channels or depend on knowledge of user

high relevance attack
Paper 2509.25926v1

Better Privilege Separation for Agents by Restricting Data Types

systems, such as AI agents. Unfortunately, these advantages have come with a vulnerability to prompt injections, an attack where an adversary subverts the LLM's intended functionality with an injected

medium relevance attack
Paper 2604.25562v1

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat

high relevance attack
Previous Page 2 of 18 Next