Search: prompt injection | AI Threat Alert

Severity:

456 results in 194ms

Paper 2510.11238v1

2025-10-13

Attacks by Content: Automated Fact-checking is an AI Security Issue

manipulate the data they receive to subvert their behaviour. Previous research has studied indirect prompt injection, where the attacker injects malicious instructions. We argue that injection of instructions

high relevance attack

Paper 2510.05244v1

2025-10-06

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept

high relevance benchmark

Paper 2606.13044v1

2026-06-11

No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions

infrastructure, most robustness concerns have focused on explicit attacks such as hidden instructions and prompt injection. We study a harder and more policy-relevant failure mode: no hidden text

medium relevance survey

Paper 2603.28013v1

2026-03-30

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior work measures task-level attack success rate (ASR); we localize the pipeline stage at which

high relevance attack

Paper 2510.09023v1

2025-10-10

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an attacker from eliciting harmful knowledge or remotely triggering malicious

high relevance attack

Paper 2605.28116v1

2026-05-27

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Injection of Realistic Adversarial GUI Examples), a pipeline that turns benign mobile screenshots into prompt-injection samples by placing attacker-controlled text into ordinary user-generated content regions, without modifying

high relevance attack

Paper 2601.04666v1

2026-01-08

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions

high relevance attack

Paper 2603.10749v1

2026-03-11

AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations

agents are highly vulnerable to Indirect Prompt Injection (IPI), where adversaries embed malicious directives in untrusted tool outputs to hijack execution. Most existing defenses treat IPI as an input-level

high relevance tool

Paper 2602.07918v1

2026-02-08

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious commands hidden within untrusted content trick the agent into performing unauthorized

high relevance attack

Paper 2602.01795v1

2026-02-02

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face

high relevance attack

Paper 2606.11806v1

2026-06-10

External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs

expose different output-cost regimes. We compare no-experience baselines, random experience controls, global prompt injection, and retrieval-based selective injection, and analyze both task quality and serving cost

medium relevance tool

Paper 2602.22242v1

2026-02-24

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large

high relevance attack

Paper 2606.05566v1

2026-06-04

GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

Large Language Models (LLMs) have transformed natural language processing, but they remain vulnerable to Prompt Injection (PI) and Jailbreak (JB) attacks. In addition, benchmark evaluations may be affected by contamination

high relevance attack

Paper 2512.05745v1

2025-12-05

ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior

Multimodal Large Language Models (MLLMs) are increasingly vulnerable to multimodal Indirect Prompt Injection (IPI) attacks, which embed malicious instructions in images, videos, or audio to hijack model behavior. Existing defenses

high relevance attack

Paper 2604.14604v1

2026-04-16

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically

high relevance attack

Paper 2606.23277v1

2026-06-22

GIF: Locally Sound Geometric Information Flow Control for LLMs

information flow induced by a given prompt under local regularity assumptions. We evaluate GIF on integrity and confidentiality tasks across multiple prompt-injection and privacy-leakage benchmarks. GIF achieves near

medium relevance benchmark

Paper 2603.25164v1

2026-03-26

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

applicability. In this work, we propose PIDP-Attack, a novel compound attack that integrates prompt injection with database poisoning in RAG. By appending malicious characters to queries at inference time

high relevance attack

Paper 2605.28467v1

2026-05-27

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

prompts and adversarial rewrites, and evaluate its two main variants, output-level (BCT) and activation-level (ACT), across five reasoning models. We formulate both methods as a prompt injection defense

high relevance attack

Paper 2602.20708v1

2026-02-24

ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typically rely on strict

high relevance attack

Paper 2512.16307v1

2025-12-18

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of models

high relevance benchmark

Previous Page 7 of 23 Next