Paper 2606.17467v1

PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents

Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this

medium relevance benchmark
Paper 2606.17034v1

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after

medium relevance attack
Paper 2606.17114v1

An Evaluation of Data Leakage Risks in Tool-Using LLM Agents in Realistic Scenarios

research on data leakage risks in agents has focused on adversarial data exfiltration through prompt injections and jailbreaks. However, sensitive information may also be exposed during non-adversarial use, creating

medium relevance benchmark
Paper 2606.16242v1

Rapid Poison: Practical Poisoning Attacks Against the Rapid Response Framework

helping the model generalize from the new attacks and quickly adapt. We reveal that prompt injection can infiltrate this pipeline to deliver poisoned samples into the classifier's training

high relevance tool
Paper 2606.15899v1

SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills

best static baseline (SKILLSIEVE) still misses 15%; for instruction-layer categories such as Prompt Injection and Memory Poisoning, conventional tools miss between 89% and 100% of threats (e.g., CODEBERT detects

medium relevance benchmark
Paper 2606.15788v1

GAS-Leak-LLM: Genetic Algorithm-Based Suffix Optimization for Black-Box LLM Jailbreaking

research has demonstrated that LLMs remain vulnerable to adversarial manipulation, particularly through jailbreaking and prompt injection techniques. In this work, we propose GAS-Leak-LLM a novel jailbreaking attack based

high relevance attack
Paper 2606.12716v1

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

dataset spanning multiple scientific domains; (2) a unified suite of attacks, including black-box prompt injections and white-box perturbations, specifically designed to target both text (GCG) and figures

high relevance survey
Paper 2606.12709v1

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model

medium relevance benchmark
Paper 2606.10860v1

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections and, more broadly, leaves models without a principled way to resolve conflicts between legitimate

medium relevance attack
Paper 2606.10749v1

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

proposed and with what tradeoffs, and how security claims are evaluated. We find that prompt injection and tool-mediated control-flow hijacking still dominate the field, while persistent state corruption

high relevance benchmark
Paper 2606.10322v1

Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMs

turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable to prompt-injection and context-poisoning attacks in which locally plausible adversarial fragments gradually distort reasoning trajectories

medium relevance benchmark
Paper 2606.05449v1

Insurance of Agentic AI

independently generating insured events through external actions. We analyze major risk pathways, including hallucinations, prompt-injection attacks, autonomous decision errors, model drift, dependency failures, and cyber-physical harms, and evaluate

medium relevance attack
Paper 2606.04612v1

Hybrid Adversarial Defence for Natural Language Understanding Tasks

similar adversarial robustness from our hybrid model (up to 57.14\% improvement in accuracy). For prompt injection (SafeGuard) and jailbreak detection (AdvBench, DAN) datasets our hybrid model is also very strong

medium relevance attack
Paper 2606.02240v1

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail, Salesforce, or Jira accessed through tool calls

high relevance defense
CVE CRITICAL CVE-2026-47392

PraisonAI vulnerable to sandbox escape via `print.__self__` builtins module leak

CVSS 9.9 PraisonAI View details
Paper 2605.30189v1

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

training data poisoning while preserving baseline task performance. On a Qwen 2.5 1.5B prompt-injection classifier, a small fraction of poisoned examples drives a clean-accuracy-preserving backdoor

high relevance attack
Paper 2605.28617v1

LACUNA: Safe Agents as Recursive Program Holes

would also sharpen safety problems. A model can be diverted by a prompt injection, call the wrong tool, or fail partway and leave an inconsistent state, and each such failure

medium relevance attack
Paper 2605.28893v1

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

Loop vulnerabilities are more challenging to precisely fix, especially for those involving prompt injections where the Pass@1 rate is only

medium relevance benchmark
Paper 2605.26497v1

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

must read external data sources (emails, webpages, files) that attackers can control; through indirect prompt injection, attackers embed malicious instructions in this data to manipulate agents into performing unauthorized operations

medium relevance defense
Paper 2605.26269v1

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

LLM agents process trusted instructions, retrieved records, and tool observations

high relevance tool
Previous Page 14 of 23 Next