Paper 2510.01354v1

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically

high relevance benchmark
Paper 2509.24967v4

SecInfer: Preventing Prompt Injection via Inference-time Scaling

Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses typically rely on fine-tuning

high relevance attack
Paper 2601.22240v1

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs can exploit LLMs, causing data leaks, unauthorized actions, or compromised outputs

high relevance survey
Paper 2601.10173v1

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

automating complex workflows across various fields. However, these systems are highly vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external data can hijack agent behavior. In this

high relevance attack
Paper 2511.00447v2

DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

they process user data according to predefined instructions. However, conventional LLMs remain vulnerable to prompt injection, where malicious users inject directive tokens into the data to subvert model behavior. Existing

high relevance attack
Paper 2512.23684v1

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight

high relevance survey
Paper 2511.21752v2

Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification

classification tasks such as sentiment analysis, yet their reliance on natural language prompts exposes them to prompt injection attacks. In particular, class-directive injections exploit knowledge of the model

high relevance attack
Paper 2602.03792v1

WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks

high relevance attack
Paper 2512.09321v3

ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data

Prompt injection attacks aim to contaminate the input data of an LLM to mislead it into completing an attacker-chosen task instead of the intended task. In many applications

high relevance attack
Paper 2510.19207v2

Defending Against Prompt Injection with DataFilter

agents are increasingly deployed to automate tasks and interact with untrusted external data, prompt injection emerges as a significant security threat. By injecting malicious instructions into the data that LLMs

high relevance attack
Paper 2511.12295v1

Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification

designed inputs. Existing detection approaches often require centralizing prompt data, creating significant privacy risks. This paper proposes a privacy-preserving prompt injection detection framework based on federated learning and embedding

high relevance attack
Paper 2601.13612v1

PINA: Prompt Injection Attack against Navigation Agents

actions. Compared to text-based applications, their security is far more critical: a successful prompt injection attack does not just alter outputs but can directly misguide physical navigation, leading

high relevance attack
Paper 2601.17383v1

Physical Prompt Injection Attacks on Large Vision-Language Models

reasoning in open physical environments. While LVLMs are known to be vulnerable to prompt injection attacks, existing methods either require access to input channels or depend on knowledge of user

high relevance attack
Paper 2509.25926v1

Better Privilege Separation for Agents by Restricting Data Types

systems, such as AI agents. Unfortunately, these advantages have come with a vulnerability to prompt injections, an attack where an adversary subverts the LLM's intended functionality with an injected

medium relevance attack
Paper 2602.09222v1

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

users' behalf. While these agents offer powerful capabilities, their design exposes them to indirect prompt injection attacks embedded in untrusted web content, enabling adversaries to hijack agent behavior and violate

high relevance attack
Paper 2601.17911v1

Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models

Prompt injection evaluations typically treat refusal as a stable, binary indicator of safety. This study challenges that paradigm by modeling refusal as a local decision boundary and examining its stability

high relevance benchmark
Paper 2510.16128v1

Prompt injections as a tool for preserving identity in GAI image descriptions

have been described, but most require top down or external intervention. An emerging strategy, prompt injections, provides an empowering alternative: indirect users can mitigate harm against them, from within their

high relevance tool
Paper 2512.00966v1

Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis

Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical threat to LLM-powered agents. In this paper, we present

high relevance attack
Paper 2602.14211v1

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

extend tool-augmented behaviors. This abstraction introduces an under-measured attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies

high relevance attack
Paper 2512.15081v1

Quantifying Return on Security Controls in LLM Systems

subjected to automated attacks with Garak across five vulnerability classes: PII leakage, latent context injection, prompt injection, adversarial attack generation, and divergence. For each (vulnerability, control) pair, attack success probabilities

medium relevance tool
Previous Page 2 of 14 Next