Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows
their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model
Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization
every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections and, more broadly, leaves models without a principled way to resolve conflicts between legitimate
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation
proposed and with what tradeoffs, and how security claims are evaluated. We find that prompt injection and tool-mediated control-flow hijacking still dominate the field, while persistent state corruption
Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMs
turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable to prompt-injection and context-poisoning attacks in which locally plausible adversarial fragments gradually distort reasoning trajectories
Insurance of Agentic AI
independently generating insured events through external actions. We analyze major risk pathways, including hallucinations, prompt-injection attacks, autonomous decision errors, model drift, dependency failures, and cyber-physical harms, and evaluate
Hybrid Adversarial Defence for Natural Language Understanding Tasks
similar adversarial robustness from our hybrid model (up to 57.14\% improvement in accuracy). For prompt injection (SafeGuard) and jailbreak detection (AdvBench, DAN) datasets our hybrid model is also very strong
AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations
Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail, Salesforce, or Jira accessed through tool calls
PraisonAI vulnerable to unauthenticated arbitrary file read via MCP workflow.show
PraisonAI vulnerable to sandbox escape via `print.__self__` builtins module leak
Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection
training data poisoning while preserving baseline task performance. On a Qwen 2.5 1.5B prompt-injection classifier, a small fraction of poisoned examples drives a clean-accuracy-preserving backdoor
LACUNA: Safe Agents as Recursive Program Holes
would also sharpen safety problems. A model can be diverted by a prompt injection, call the wrong tool, or fail partway and leave an inconsistent state, and each such failure
Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities
Loop vulnerabilities are more challenging to precisely fix, especially for those involving prompt injections where the Pass@1 rate is only
Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents
must read external data sources (emails, webpages, files) that attackers can control; through indirect prompt injection, attackers embed malicious instructions in this data to manipulate agents into performing unauthorized operations
AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents
LLM agents process trusted instructions, retrieved records, and tool observations
How Agentic AI Coding Assistants Become the Attacker's Shell
attacker's shell to run unauthorized commands. In this article, we examine how these prompt injection attacks work, measure their prevalence, discuss the limitations and challenges of current defenses
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independent tasks over
ADR: An Agentic Detection System for Enterprise Agentic AI Security
baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2--4x in F1-score. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks
Web Agents Should Adopt the Plan-Then-Execute Paradigm
into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted
Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents
Always-on AI agents (OpenClaw, Hermes Agent) run as a
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills
ignores the documented constraint. These violations are invisible to static analyzers, traditional fuzzers, and prompt-injection defenses alike, yet they undermine the very contract a user trusts when installing