ADR: An Agentic Detection System for Enterprise Agentic AI Security
baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2--4x in F1-score. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks
Web Agents Should Adopt the Plan-Then-Execute Paradigm
into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted
Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents
Always-on AI agents (OpenClaw, Hermes Agent) run as a
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills
ignores the documented constraint. These violations are invisible to static analyzers, traditional fuzzers, and prompt-injection defenses alike, yet they undermine the very contract a user trusts when installing
No More, No Less: Task Alignment in Terminal Agents
Bench agent achieves high task completion but low task alignment on TAB. Evaluating six prompt-injection defenses further shows that suppressing distractor execution also suppresses the cues required for task
Agents Should Replace Narrow Predictive AI as the Orchestrator in 6G AI-RAN
edge quantization, neuro-symbolic verification to curb hallucinations, and securing orchestration frameworks against adversarial prompt injections
RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems
safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections
PraisonAI ships and generates a legacy API server with authentication
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
legitimately deployed agent may be steered toward unsafe operations through malicious messages, indirect prompt injection, unsafe skills, or tampering along the host-side control path. We argue that such risks
Laundering AI Authority with Adversarial Examples
produces confident and authoritative responses about the \emph{wrong} input. Unlike jailbreaks or prompt injections, our attacks do not compromise model alignment; the attack operates entirely at the perceptual level
PPTAgent: Arbitrary Code Execution via Python eval() of LLM-Generated
OpenClaw's gateway config mutation guard allowed unsafe model-driven
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within
PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization
with limited deployable options. We present PIIGuard, a webpage-level defense that repurposes indirect prompt injection as a protective mechanism: the page owner embeds optimized hidden HTML fragments that steer
Tool Use as Action: Towards Agentic Control in Mobile Core Networks
functions and break down the latency of end-to-end operations, starting from the prompt injection until the completion of the input task. This work demonstrates how an AI agent
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
AgentDojo and ASB with six LLMs, RTA achieves up to 99.1% attack success, outperforming prompt-injection baselines with modest overhead. Case studies on OpenClaw and Claude Code demonstrate real-world
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection
Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where individual turns appear benign. We show this attack path
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model
rubric measurement, and the evaluation output. The paper also considers a scenario-specific prompt injection risk: applicant-controlled documents may contain hidden or indirect instructions intended to influence
FCMBench-Video: Benchmarking Document Video Intelligence
Cross-Document Validation and Evidence-Grounded Selection probe higher-level evidence integration, and Visual Prompt Injection provides a complementary robustness dimension. The overall score distribution is broad and approximately bell