Search: prompt injection | AI Threat Alert

429 results in 119ms

Paper 2605.25871v1

2026-05-25

How Agentic AI Coding Assistants Become the Attacker's Shell

attacker's shell to run unauthorized commands. In this article, we examine how these prompt injection attacks work, measure their prevalence, discuss the limitations and challenges of current defenses

high relevance attack

Paper 2605.17830v1

2026-05-18

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independent tasks over

medium relevance defense

Paper 2605.17380v1

2026-05-17

ADR: An Agentic Detection System for Enterprise Agentic AI Security

baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2--4x in F1-score. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks

medium relevance tool

Paper 2605.14290v1

2026-05-14

Web Agents Should Adopt the Plan-Then-Execute Paradigm

into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted

medium relevance survey

Paper 2605.13471v1

2026-05-13

Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents

Always-on AI agents (OpenClaw, Hermes Agent) run as a

high relevance attack

Paper 2605.13044v1

2026-05-13

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

ignores the documented constraint. These violations are invisible to static analyzers, traditional fuzzers, and prompt-injection defenses alike, yet they undermine the very contract a user trusts when installing

high relevance attack

Paper 2605.12233v1

2026-05-12

No More, No Less: Task Alignment in Terminal Agents

Bench agent achieves high task completion but low task alignment on TAB. Evaluating six prompt-injection defenses further shows that suppressing distractor execution also suppresses the cues required for task

medium relevance defense

Paper 2605.11516v1

2026-05-12

Agents Should Replace Narrow Predictive AI as the Orchestrator in 6G AI-RAN

edge quantization, neuro-symbolic verification to curb hallucinations, and securing orchestration frameworks against adversarial prompt injections

medium relevance attack

Paper 2605.10862v1

2026-05-11

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections

medium relevance tool

Paper 2605.09822v1

2026-05-10

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate

medium relevance attack

Paper 2605.06393v1

2026-05-07

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

legitimately deployed agent may be steered toward unsafe operations through malicious messages, indirect prompt injection, unsafe skills, or tampering along the host-side control path. We argue that such risks

medium relevance benchmark

Paper 2605.04261v1

2026-05-05

Laundering AI Authority with Adversarial Examples

produces confident and authoritative responses about the \emph{wrong} input. Unlike jailbreaks or prompt injections, our attacks do not compromise model alignment; the attack operates entirely at the perceptual level

medium relevance attack

Paper 2605.03213v1

2026-05-04

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within

medium relevance survey

Paper 2605.03129v1

2026-05-04

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

with limited deployable options. We present PIIGuard, a webpage-level defense that repurposes indirect prompt injection as a protective mechanism: the page owner embeds optimized hidden HTML fragments that steer

medium relevance attack

Paper 2605.02811v1

2026-05-04

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

functions and break down the latency of end-to-end operations, starting from the prompt injection until the completion of the input task. This work demonstrates how an AI agent

medium relevance attack

Paper 2605.02187v1

2026-05-04

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

AgentDojo and ASB with six LLMs, RTA achieves up to 99.1% attack success, outperforming prompt-injection baselines with modest overhead. Case studies on OpenClaw and Claude Code demonstrate real-world

high relevance attack

Paper 2604.28129v1

2026-04-30

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where individual turns appear benign. We show this attack path

high relevance attack

Paper 2604.25200v1

2026-04-28

Making AI-Assisted Grant Evaluation Auditable without Exposing the Model

rubric measurement, and the evaluation output. The paper also considers a scenario-specific prompt injection risk: applicant-controlled documents may contain hidden or indirect instructions intended to influence

medium relevance benchmark

Paper 2604.25186v1

2026-04-28

FCMBench-Video: Benchmarking Document Video Intelligence

Cross-Document Validation and Evidence-Grounded Selection probe higher-level evidence integration, and Visual Prompt Injection provides a complementary robustness dimension. The overall score distribution is broad and approximately bell

medium relevance benchmark

Paper 2604.25102v1

2026-04-28

One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations

Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power autonomous agents. Prior work typically focus on maximizing

medium relevance defense

Previous Page 14 of 22 Next