Search: prompt injection | AI Threat Alert

Severity:

489 results in 191ms

Paper 2605.28467v1

2026-05-27

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

prompts and adversarial rewrites, and evaluate its two main variants, output-level (BCT) and activation-level (ACT), across five reasoning models. We formulate both methods as a prompt injection defense

high relevance attack

Paper 2602.20708v1

2026-02-24

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typically rely on strict

high relevance attack

Paper 2512.16307v1

2025-12-18

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of models

high relevance benchmark

Paper 2510.04261v1

2025-10-05

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

integrated applications? To address this question, we propose \textsc{VortexPIA}, a novel indirect prompt injection attack that induces privacy extraction in LLM-integrated applications under black-box settings. By injecting

high relevance attack

Paper 2512.01326v1

2025-12-01

Securing Large Language Models (LLMs) from Prompt Injection Attacks

increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious

high relevance attack

Paper 2606.09005v1

2026-06-08

Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries

signal when RAG prompt rendering collapses trusted and untrusted text into the same natural-language channel. We evaluate DACSI across six model settings, prompt-pressure levels, injection baselines, signal taxonomies

high relevance attack

CVE HIGH CVE-2026-46580

2026-06-18

malicious repository containing prompt template files that, when the workspace was opened in Theia, replaced the AI's system instructions with attacker-controlled content (indirect prompt injection). Combined with other

@theia/ai-editor View details

Paper 2601.19051v1

2026-01-27

Proactive Hardening of LLM Defenses with HASTE

enhance detection efficacy for prompt-based attack techniques. The framework is agnostic to synthetic data generation methods, and can be generalized to evaluate prompt-injection detection efficacy, with and without

medium relevance defense

Paper 2510.09093v1

2025-10-10

Exploiting Web Search Tools of AI Agents for Data Exfiltration

functionality and vulnerability to abuse. As LLMs increasingly interact with external data sources, indirect prompt injection emerges as a critical and evolving attack vector, enabling adversaries to exploit models through

high relevance tool

Paper 2510.09849v1

2025-10-10

Text Prompt Injection of Vision Language Models

vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm

high relevance attack

Paper 2605.17986v1

2026-05-18

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio

increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI) risk: an agent may execute harmful instructions embedded in untrusted inputs such as email

medium relevance benchmark

Paper 2512.10104v2

2025-12-10

Phishing Email Detection Using Large Language Models

based framework to detect phishing email attacks across multiple attack vectors, including prompt injection, text refinement, and multilingual attacks. We evaluate three frontier LLMs (e.g., GPT-4o, Claude Sonnet

medium relevance defense

Paper 2602.20064v1

2026-02-23

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

prompt, and parses the response as a new term. This calculus faithfully represents planner loops and their vulnerabilities, including the mechanisms by which prompt injection alters subsequent computation. The semantics

medium relevance attack

Paper 2603.23791v1

2026-03-24

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise

high relevance attack

Paper 2602.22724v1

2026-02-26

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

retrieval systems to autonomously complete complex tasks. However, this design exposes agents to indirect prompt injection (IPI), where attacker-controlled context embedded in tool outputs or retrieved content silently steers

high relevance attack

Paper 2510.08829v1

2025-10-09

CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization

access to numerous tools and sensitive data significantly widens the attack surface for indirect prompt injections. Due to the context-dependent nature of attacks, however, current defenses are often

medium relevance benchmark

Paper 2601.07072v1

2026-01-11

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

rely on retrieving information from external corpora. This creates a new attack surface: indirect prompt injection (IPI), where hidden instructions are planted in the corpora and hijack model behavior once

high relevance tool

Paper 2601.22569v1

2026-01-30

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

teaming evaluation of AP2 and identify vulnerabilities arising from indirect and direct prompt injection. We introduce two attack techniques, the Branded Whisper Attack and the Vault Whisper Attack which manipulate

high relevance attack

Paper 2606.19235v1

2026-06-17

CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

code context from repositories, documentation, issue threads, and coding-agent environments, creating an indirect prompt-injection surface where attackers hide instructions in comments, strings, identifiers, or decoy code. We propose

high relevance attack

Paper 2602.10453v1

2026-02-11

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against Prompt Injection (PI) vulnerabilities where untrusted inputs hijack agent behaviors. This SoK presents a comprehensive overview

high relevance survey

Previous Page 8 of 25 Next