Search: prompt injection | AI Threat Alert

429 results in 111ms

Paper 2604.19657v1

2026-04-21

An AI Agent Execution Environment to Safeguard User Data

serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user data. Furthermore, sharing private data with an AI agent requires users

medium relevance benchmark

Paper 2606.17467v1

2026-06-16

PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents

Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this

medium relevance benchmark

Paper 2606.17034v1

2026-06-15

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after

medium relevance attack

Paper 2606.17114v1

2026-06-15

An Evaluation of Data Leakage Risks in Tool-Using LLM Agents in Realistic Scenarios

research on data leakage risks in agents has focused on adversarial data exfiltration through prompt injections and jailbreaks. However, sensitive information may also be exposed during non-adversarial use, creating

medium relevance benchmark

Paper 2606.16242v1

2026-06-15

Rapid Poison: Practical Poisoning Attacks Against the Rapid Response Framework

helping the model generalize from the new attacks and quickly adapt. We reveal that prompt injection can infiltrate this pipeline to deliver poisoned samples into the classifier's training

high relevance tool

Paper 2606.15899v1

2026-06-14

SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills

best static baseline (SKILLSIEVE) still misses 15%; for instruction-layer categories such as Prompt Injection and Memory Poisoning, conventional tools miss between 89% and 100% of threats (e.g., CODEBERT detects

medium relevance benchmark

Paper 2606.15788v1

2026-06-14

GAS-Leak-LLM: Genetic Algorithm-Based Suffix Optimization for Black-Box LLM Jailbreaking

research has demonstrated that LLMs remain vulnerable to adversarial manipulation, particularly through jailbreaking and prompt injection techniques. In this work, we propose GAS-Leak-LLM a novel jailbreaking attack based

high relevance attack

Paper 2606.12716v1

2026-06-10

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

dataset spanning multiple scientific domains; (2) a unified suite of attacks, including black-box prompt injections and white-box perturbations, specifically designed to target both text (GCG) and figures

high relevance survey

Paper 2606.12709v1

2026-06-10

Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

their collaboration structures against adversarial compromise becomes a critical safety concern. Attackers may leverage prompt-injection or jailbreaking to sabotage individual agents within MAS workflows, but the interaction between model

medium relevance benchmark

Paper 2606.10860v1

2026-06-09

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

every token with uniform architectural privilege. This is the structural vulnerability that enables malicious prompt injections and, more broadly, leaves models without a principled way to resolve conflicts between legitimate

medium relevance attack

Paper 2606.10749v1

2026-06-09

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

proposed and with what tradeoffs, and how security claims are evaluated. We find that prompt injection and tool-mediated control-flow hijacking still dominate the field, while persistent state corruption

high relevance benchmark

Paper 2606.10322v1

2026-06-09

Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMs

turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable to prompt-injection and context-poisoning attacks in which locally plausible adversarial fragments gradually distort reasoning trajectories

medium relevance benchmark

Paper 2606.05449v1

2026-06-03

Insurance of Agentic AI

independently generating insured events through external actions. We analyze major risk pathways, including hallucinations, prompt-injection attacks, autonomous decision errors, model drift, dependency failures, and cyber-physical harms, and evaluate

medium relevance attack

Paper 2606.04612v1

2026-06-03

Hybrid Adversarial Defence for Natural Language Understanding Tasks

similar adversarial robustness from our hybrid model (up to 57.14\% improvement in accuracy). For prompt injection (SafeGuard) and jailbreak detection (AdvBench, DAN) datasets our hybrid model is also very strong

medium relevance attack

Paper 2606.02240v1

2026-06-01

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail, Salesforce, or Jira accessed through tool calls

high relevance defense

Paper 2605.30189v1

2026-05-28

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

training data poisoning while preserving baseline task performance. On a Qwen 2.5 1.5B prompt-injection classifier, a small fraction of poisoned examples drives a clean-accuracy-preserving backdoor

high relevance attack

Paper 2605.28617v1

2026-05-27

LACUNA: Safe Agents as Recursive Program Holes

would also sharpen safety problems. A model can be diverted by a prompt injection, call the wrong tool, or fail partway and leave an inconsistent state, and each such failure

medium relevance attack

Paper 2605.28893v1

2026-05-27

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

Loop vulnerabilities are more challenging to precisely fix, especially for those involving prompt injections where the Pass@1 rate is only

medium relevance benchmark

Paper 2605.26497v1

2026-05-26

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

must read external data sources (emails, webpages, files) that attackers can control; through indirect prompt injection, attackers embed malicious instructions in this data to manipulate agents into performing unauthorized operations

medium relevance defense

Paper 2605.26269v1

2026-05-25

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

LLM agents process trusted instructions, retrieved records, and tool observations

high relevance tool

Previous Page 13 of 22 Next