How Agentic AI Coding Assistants Become the Attacker's Shell
attacker's shell to run unauthorized commands. In this article, we examine how these prompt injection attacks work, measure their prevalence, discuss the limitations and challenges of current defenses
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independent tasks over
ADR: An Agentic Detection System for Enterprise Agentic AI Security
baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2--4x in F1-score. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks
Web Agents Should Adopt the Plan-Then-Execute Paradigm
into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted
Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents
Always-on AI agents (OpenClaw, Hermes Agent) run as a
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills
ignores the documented constraint. These violations are invisible to static analyzers, traditional fuzzers, and prompt-injection defenses alike, yet they undermine the very contract a user trusts when installing
No More, No Less: Task Alignment in Terminal Agents
Bench agent achieves high task completion but low task alignment on TAB. Evaluating six prompt-injection defenses further shows that suppressing distractor execution also suppresses the cues required for task
Agents Should Replace Narrow Predictive AI as the Orchestrator in 6G AI-RAN
edge quantization, neuro-symbolic verification to curb hallucinations, and securing orchestration frameworks against adversarial prompt injections
RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems
safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
legitimately deployed agent may be steered toward unsafe operations through malicious messages, indirect prompt injection, unsafe skills, or tampering along the host-side control path. We argue that such risks
Laundering AI Authority with Adversarial Examples
produces confident and authoritative responses about the \emph{wrong} input. Unlike jailbreaks or prompt injections, our attacks do not compromise model alignment; the attack operates entirely at the perceptual level
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within
PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization
with limited deployable options. We present PIIGuard, a webpage-level defense that repurposes indirect prompt injection as a protective mechanism: the page owner embeds optimized hidden HTML fragments that steer
Tool Use as Action: Towards Agentic Control in Mobile Core Networks
functions and break down the latency of end-to-end operations, starting from the prompt injection until the completion of the input task. This work demonstrates how an AI agent
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
AgentDojo and ASB with six LLMs, RTA achieves up to 99.1% attack success, outperforming prompt-injection baselines with modest overhead. Case studies on OpenClaw and Claude Code demonstrate real-world
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection
Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where individual turns appear benign. We show this attack path
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model
rubric measurement, and the evaluation output. The paper also considers a scenario-specific prompt injection risk: applicant-controlled documents may contain hidden or indirect instructions intended to influence
FCMBench-Video: Benchmarking Document Video Intelligence
Cross-Document Validation and Evidence-Grounded Selection probe higher-level evidence integration, and Visual Prompt Injection provides a complementary robustness dimension. The overall score distribution is broad and approximately bell
One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations
Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power autonomous agents. Prior work typically focus on maximizing