Paper 2606.12709v1

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model

medium relevance benchmark
Paper 2606.10860v1

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections and, more broadly, leaves models without a principled way to resolve conflicts between legitimate

medium relevance attack
Paper 2606.10749v1

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

proposed and with what tradeoffs, and how security claims are evaluated. We find that prompt injection and tool-mediated control-flow hijacking still dominate the field, while persistent state corruption

high relevance benchmark
Paper 2606.10322v1

Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMs

turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable to prompt-injection and context-poisoning attacks in which locally plausible adversarial fragments gradually distort reasoning trajectories

medium relevance benchmark
Paper 2606.05449v1

Insurance of Agentic AI

independently generating insured events through external actions. We analyze major risk pathways, including hallucinations, prompt-injection attacks, autonomous decision errors, model drift, dependency failures, and cyber-physical harms, and evaluate

medium relevance attack
Paper 2606.04612v1

Hybrid Adversarial Defence for Natural Language Understanding Tasks

similar adversarial robustness from our hybrid model (up to 57.14\% improvement in accuracy). For prompt injection (SafeGuard) and jailbreak detection (AdvBench, DAN) datasets our hybrid model is also very strong

medium relevance attack
Paper 2606.02240v1

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail, Salesforce, or Jira accessed through tool calls

high relevance defense

PraisonAI vulnerable to unauthenticated arbitrary file read via MCP workflow.show

PraisonAI View details
CVE CRITICAL CVE-2026-47392

PraisonAI vulnerable to sandbox escape via `print.__self__` builtins module leak

CVSS 9.9 PraisonAI View details
Paper 2605.30189v1

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

training data poisoning while preserving baseline task performance. On a Qwen 2.5 1.5B prompt-injection classifier, a small fraction of poisoned examples drives a clean-accuracy-preserving backdoor

high relevance attack
Paper 2605.28617v1

LACUNA: Safe Agents as Recursive Program Holes

would also sharpen safety problems. A model can be diverted by a prompt injection, call the wrong tool, or fail partway and leave an inconsistent state, and each such failure

medium relevance attack
Paper 2605.28893v1

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

Loop vulnerabilities are more challenging to precisely fix, especially for those involving prompt injections where the Pass@1 rate is only

medium relevance benchmark
Paper 2605.26497v1

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

must read external data sources (emails, webpages, files) that attackers can control; through indirect prompt injection, attackers embed malicious instructions in this data to manipulate agents into performing unauthorized operations

medium relevance defense
Paper 2605.26269v1

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

LLM agents process trusted instructions, retrieved records, and tool observations

high relevance tool
Paper 2605.25871v1

How Agentic AI Coding Assistants Become the Attacker's Shell

attacker's shell to run unauthorized commands. In this article, we examine how these prompt injection attacks work, measure their prevalence, discuss the limitations and challenges of current defenses

high relevance attack
Paper 2605.17830v1

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independent tasks over

medium relevance defense
Paper 2605.17380v1

ADR: An Agentic Detection System for Enterprise Agentic AI Security

baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2--4x in F1-score. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks

medium relevance tool
Paper 2605.14290v1

Web Agents Should Adopt the Plan-Then-Execute Paradigm

into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted

medium relevance survey
Paper 2605.13471v1

Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents

Always-on AI agents (OpenClaw, Hermes Agent) run as a

high relevance attack
Paper 2605.13044v1

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

ignores the documented constraint. These violations are invisible to static analyzers, traditional fuzzers, and prompt-injection defenses alike, yet they undermine the very contract a user trusts when installing

high relevance attack
Previous Page 17 of 28 Next