Search: prompt injection | AI Threat Alert

Severity:

456 results in 194ms

Paper 2605.26595v1

2026-05-26

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses. With a small poisoned fraction, covert control attacks outperform heuristic-based prompt injection

high relevance attack

Paper 2606.13385v1

2026-06-11

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing

high relevance benchmark

Paper 2603.30016v1

2026-03-31

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper

high relevance tool

Paper 2605.25194v1

2026-05-24

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

Adversarial images pose a severe security threat to multimodal large language models through prompt injection. Existing defenses largely lack a principled understanding of the underlying mechanisms and struggle to balance

high relevance attack

Paper 2510.05244v2

2025-10-06

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept

high relevance benchmark

Paper 2603.17639v1

2026-03-18

VeriGrey: Greybox Agent Validation

behavior. As mutation operators in the testing process, we mutate prompts to design pernicious injection prompts. This is carefully accomplished by linking the task of the agent to an injection

medium relevance benchmark

Paper 2601.04795v1

2026-01-08

Defense Against Indirect Prompt Injection via Tool Result Parsing

malicious instructions via prompt engineering. Despite their flexibility, most current prompt-based defenses suffer from high Attack Success Rates (ASR), demonstrating limited robustness against sophisticated injection attacks. In this paper

high relevance tool

Paper 2510.04528v1

2025-10-06

Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers

rapid adoption of large language models (LLMs) in enterprise systems exposes vulnerabilities to prompt injection attacks, strategic deception, and biased outputs, threatening security, trust, and fairness. Extending our adversarial activation

high relevance attack

Paper 2605.03378v1

2026-05-05

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

with tool use, skills, and external knowledge, has introduced new security risks. Among them, prompt injection attacks, where adversaries embed malicious instructions into the agent workflow, have emerged

high relevance attack

Paper 2604.11790v1

2026-04-13

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

vulnerability manifests across three primary attack channels: web and local content injection, MCP server injection, and skill file injection. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime

high relevance tool

Paper 2602.07104v1

2026-02-06

Extended to Reality: Prompt Injection in 3D Environments

objects in the environment to override MLLMs' intended task. While prior work has studied prompt injection in the text domain and through digitally edited 2D images, it remains unclear

high relevance attack

Paper 2603.03637v1

2026-03-04

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images

high relevance attack

Paper 2604.27202v1

2026-04-29

Indirect Prompt Injection in the Wild: An Empirical Study of Prevalence, Techniques, and Objectives

site owners, contributors, and adversaries to embed instructions directly in web resources, i.e., indirect prompt injections. While prior work demonstrates such attacks in controlled settings, their prevalence, deployment, and real

high relevance attack

Paper 2510.13543v1

2025-10-15

In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers

browsers) offer powerful automation of web tasks. However, they are vulnerable to indirect prompt injection attacks, where malicious instructions hidden in a webpage deceive the agent into unwanted actions. These

high relevance attack

Paper 2511.01634v2

2025-11-03

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

while powerful, also makes them vulnerable to a new class of attacks known as prompt injection. In these attacks, hidden or malicious instructions are inserted into user inputs or external

high relevance attack

Paper 2512.12594v2

2025-12-14

ceLLMate: Sandboxing Browser AI Agents

across pages. While these agents help automate repetitive online tasks, they are vulnerable to prompt injection attacks that trick an agent into performing undesired actions, such as leaking private information

medium relevance benchmark

Paper 2601.11199v1

2026-01-16

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

disclosing sensitive information; however, recent studies have also demonstrated that LLMs remain vulnerable to prompt injection attacks that can override intended behavioral constraints. For these reasons, we propose a novel

high relevance attack

Paper 2605.25415v1

2026-05-25

LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

LLMs along three axes: rating calibration, divergence from human reviewers, and resistance to prompt injection embedded via an invisible font-mapping attack. We find that LLMs systematically overrate weaker submissions

high relevance survey

Paper 2510.05442v1

2025-10-06

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Search to complete complex tasks. However, this tool usage introduces the risk of indirect prompt injections, where malicious instructions hidden in tool outputs can manipulate the agent, posing security risks

medium relevance attack

Paper 2604.28157v1

2026-04-30

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats

high relevance attack

Previous Page 5 of 23 Next