Paper 2605.10862v1

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections

medium relevance tool
Paper 2605.09822v1

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate

medium relevance attack
Paper 2605.06393v1

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

legitimately deployed agent may be steered toward unsafe operations through malicious messages, indirect prompt injection, unsafe skills, or tampering along the host-side control path. We argue that such risks

medium relevance benchmark
CVE MEDIUM CVE-2026-43901

wireshark-mcp vulnerable to arbitrary file write via export_objects

CVSS 6.8 wireshark-mcp View details
Paper 2605.04261v1

Laundering AI Authority with Adversarial Examples

produces confident and authoritative responses about the \emph{wrong} input. Unlike jailbreaks or prompt injections, our attacks do not compromise model alignment; the attack operates entirely at the perceptual level

medium relevance attack
Paper 2605.03213v1

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within

medium relevance survey
Paper 2605.03129v1

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

with limited deployable options. We present PIIGuard, a webpage-level defense that repurposes indirect prompt injection as a protective mechanism: the page owner embeds optimized hidden HTML fragments that steer

medium relevance attack
Paper 2605.02811v1

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

functions and break down the latency of end-to-end operations, starting from the prompt injection until the completion of the input task. This work demonstrates how an AI agent

medium relevance attack
Paper 2605.02187v1

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

AgentDojo and ASB with six LLMs, RTA achieves up to 99.1% attack success, outperforming prompt-injection baselines with modest overhead. Case studies on OpenClaw and Claude Code demonstrate real-world

high relevance attack
Paper 2604.28129v1

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where individual turns appear benign. We show this attack path

high relevance attack

OpenClaw: Webchat audio embedding could read local files without local

Paper 2604.25200v1

Making AI-Assisted Grant Evaluation Auditable without Exposing the Model

rubric measurement, and the evaluation output. The paper also considers a scenario-specific prompt injection risk: applicant-controlled documents may contain hidden or indirect instructions intended to influence

medium relevance benchmark
Paper 2604.25186v1

FCMBench-Video: Benchmarking Document Video Intelligence

Cross-Document Validation and Evidence-Grounded Selection probe higher-level evidence integration, and Visual Prompt Injection provides a complementary robustness dimension. The overall score distribution is broad and approximately bell

medium relevance benchmark
Paper 2604.25102v1

One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations

Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power autonomous agents. Prior work typically focus on maximizing

medium relevance defense
Paper 2604.24920v1

SUDP: Secret-Use Delegation Protocol for Agentic Systems

reusable artifact derived from it, within a model-steerable boundary, so a transient prompt-injection or tool-side compromise becomes durable account compromise. Existing defenses cover adjacent pieces such

medium relevance survey
Paper 2604.23593v1

When AI reviews science: Can we trust the referee?

informal adoption have exposed acute failure modes. Recent incidents have revealed that hidden prompt injections embedded in manuscripts can steer LLM-generated reviews toward unjustifiably positive judgments. Complementary studies have

medium relevance survey

OpenClaw: Agent gateway config mutations could change protected operator settings

Paper 2604.20732v1

Anchor-and-Resume Concession Under Dynamic Pricing for LLM-Augmented Freight Negotiation

flexibility but require expensive reasoning models, produce non-deterministic pricing, and remain vulnerable to prompt injection. We propose a two-index anchor-and-resume framework that addresses both limitations

medium relevance benchmark
Paper 2604.18206v1

A Control Architecture for Training-Free Memory Use

Prompt-injected memory can improve reasoning without updating model weights, but it also creates a control problem: retrieved content helps only when it is applied in the right state

low relevance benchmark
Paper 2604.17562v1

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and persistent context, making input-output filtering alone insufficient for reliable

medium relevance defense
Previous Page 15 of 23 Next