Paper 2603.10749v1

AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations

agents are highly vulnerable to Indirect Prompt Injection (IPI), where adversaries embed malicious directives in untrusted tool outputs to hijack execution. Most existing defenses treat IPI as an input-level

high relevance tool
Paper 2602.07918v1

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious commands hidden within untrusted content trick the agent into performing unauthorized

high relevance attack
Paper 2602.01795v1

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face

high relevance attack
Paper 2606.11806v1

External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs

expose different output-cost regimes. We compare no-experience baselines, random experience controls, global prompt injection, and retrieval-based selective injection, and analyze both task quality and serving cost

medium relevance tool
Paper 2602.22242v1

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large

high relevance attack
Paper 2606.05566v1

GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

Large Language Models (LLMs) have transformed natural language processing, but they remain vulnerable to Prompt Injection (PI) and Jailbreak (JB) attacks. In addition, benchmark evaluations may be affected by contamination

high relevance attack
Paper 2512.05745v1

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

Multimodal Large Language Models (MLLMs) are increasingly vulnerable to multimodal Indirect Prompt Injection (IPI) attacks, which embed malicious instructions in images, videos, or audio to hijack model behavior. Existing defenses

high relevance attack
Paper 2604.14604v1

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically

high relevance attack
Paper 2606.23277v1

GIF: Locally Sound Geometric Information Flow Control for LLMs

information flow induced by a given prompt under local regularity assumptions. We evaluate GIF on integrity and confidentiality tasks across multiple prompt-injection and privacy-leakage benchmarks. GIF achieves near

medium relevance benchmark
Paper 2603.25164v1

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

applicability. In this work, we propose PIDP-Attack, a novel compound attack that integrates prompt injection with database poisoning in RAG. By appending malicious characters to queries at inference time

high relevance attack
Paper 2605.28467v1

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

prompts and adversarial rewrites, and evaluate its two main variants, output-level (BCT) and activation-level (ACT), across five reasoning models. We formulate both methods as a prompt injection defense

high relevance attack
Paper 2602.20708v1

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typically rely on strict

high relevance attack
Paper 2512.16307v1

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of models

high relevance benchmark
Paper 2510.04261v1

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

integrated applications? To address this question, we propose \textsc{VortexPIA}, a novel indirect prompt injection attack that induces privacy extraction in LLM-integrated applications under black-box settings. By injecting

high relevance attack
Paper 2512.01326v1

Securing Large Language Models (LLMs) from Prompt Injection Attacks

increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious

high relevance attack
Paper 2606.09005v1

Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries

signal when RAG prompt rendering collapses trusted and untrusted text into the same natural-language channel. We evaluate DACSI across six model settings, prompt-pressure levels, injection baselines, signal taxonomies

high relevance attack

PraisonAI: Coarse-Grained Tool Approval Cache Bypasses Per-Invocation Consent

CVSS 5.5 praisonaiagents View details
Paper 2601.19051v1

Proactive Hardening of LLM Defenses with HASTE

enhance detection efficacy for prompt-based attack techniques. The framework is agnostic to synthetic data generation methods, and can be generalized to evaluate prompt-injection detection efficacy, with and without

medium relevance defense
Paper 2510.09093v1

Exploiting Web Search Tools of AI Agents for Data Exfiltration

functionality and vulnerability to abuse. As LLMs increasingly interact with external data sources, indirect prompt injection emerges as a critical and evolving attack vector, enabling adversaries to exploit models through

high relevance tool
Paper 2510.09849v1

Text Prompt Injection of Vision Language Models

vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm

high relevance attack
Previous Page 7 of 23 Next