Search: prompt injection | AI Threat Alert

427 results in 44ms

Paper 2510.03705v1

2025-10-04

Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods

data content, such as web pages from search engines, the LLMs are vulnerable to prompt injection attacks. These attacks trick the LLMs into deviating from the original input instruction

high relevance attack

Paper 2510.19844v1

2025-10-20

CourtGuard: A Local, Multiagent Prompt Injection Classifier

system, where a "defense attorney" model argues the prompt is benign, a "prosecution attorney" model argues the prompt is a prompt injection, and a "judge" model gives the final classification

high relevance attack

Paper 2510.12252v2

2025-10-14

PromptLocate: Localizing Prompt Injection Attacks

segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks

high relevance attack

Paper 2605.28999v1

2026-05-27

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

contain hidden prompt injections; the prevalence of such injected resumes has increased noticeably over the past one to two years; and more than 90% of injected prompts

high relevance attack

Paper 2603.13026v1

2026-03-13

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

based red-teaming framework that systematically assesses existing prompt-injection defenses by training an attack LLM to optimize injected prompts in a practical black-box setting, where the attacker

high relevance attack

Paper 2603.12277v2

2026-02-22

Prompt Injection as Role Confusion

reveal why prompt injection works: untrusted text that imitates a role inherits that role's authority. We test this insight by injecting spoofed reasoning into user prompts and tool outputs

high relevance attack

Paper 2603.12277v1

2026-02-22

Prompt Injection as Role Confusion

reveal why prompt injection works: untrusted text that imitates a role inherits that role's authority. We test this insight by injecting spoofed reasoning into user prompts and tool outputs

high relevance attack

Paper 2511.10720v1

2025-11-13

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

response, thereby eliminating the influence of the injected instruction. To sanitize injected tokens, PISanitizer builds on two observations: (1) prompt injection attacks essentially craft an instruction that compels

high relevance attack

Paper 2603.29418v1

2026-03-31

Adversarial Prompt Injection Attack on Multimodal Large Language Models

methods predominantly rely on textual prompts or perceptible visual prompts that are observable by human users. In this work, we study imperceptible visual prompt injection against powerful closed-source MLLMs

high relevance attack

Paper 2601.09625v2

2026-01-14

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

malware execution mechanism triggered through prompts engineered to exploit an application's LLM. We introduce a seven-stage promptware kill chain: Initial Access (prompt injection), Privilege Escalation (jailbreaking), Reconnaissance, Persistence

high relevance attack

Paper 2604.01194v1

2026-04-01

AgentWatcher: A Rule-based Prompt Injection Monitor

Large language models (LLMs) and their applications, such as agents, are highly vulnerable to prompt injection attacks. State-of-the-art prompt injection detection methods have the following limitations

high relevance attack

Paper 2603.21642v1

2026-03-23

Are AI-assisted Development Tools Immune to Prompt Injection?

Prompt injection is listed as the number-one vulnerability class in the OWASP Top 10 for LLM Applications that can subvert LLM guardrails, disclose sensitive data, and trigger unauthorized tool

high relevance tool

Paper 2510.04885v1

2025-10-06

RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection

Prompt injection poses a serious threat to the reliability and safety of LLM agents. Recent defenses against prompt injection, such as Instruction Hierarchy and SecAlign, have shown notable robustness against

high relevance attack

Paper 2604.08499v1

2026-04-09

PIArena: A Platform for Prompt Injection Evaluation

Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the community faces a critical gap: the lack of a unified

high relevance benchmark

Paper 2603.11331v1

2026-03-11

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

size-biased clusters is designated unsafe. Within this framework, we analyze prompt injection-based jailbreaking. Short injected prompts correspond to a weak magnetic field aligned towards unsafe cluster centers

high relevance attack

Paper 2510.14005v3

2025-10-15

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

injection attacks, where an attacker contaminates the input to inject malicious instructions, causing the LLM to follow the attacker's intent instead of the original user's. Existing prompt injection

high relevance attack

Paper 2510.04257v1

2025-10-05

AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to prompt injection, especially through visual inputs. We introduce AgentTypo, a black-box red-teaming framework that

high relevance attack

Paper 2602.05746v1

2026-02-05

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Prompt injection is one of the most critical vulnerabilities in LLM agents; yet, effective automated attacks remain largely unexplored from an optimization perspective. Existing methods heavily depend on human

high relevance attack

Paper 2601.12359v1

2026-01-18

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

manipulations inherent to real-world injection attacks. To ensure robust evaluation, we assemble and re-annotate the comprehensive LLMail-Inject dataset spanning five injection categories derived from publicly available sources

high relevance attack

Paper 2511.01287v1

2025-11-03

"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers

reviewing scientific papers. However, recent reports have revealed that some papers contain hidden, injected prompts designed to manipulate AI reviewers into providing overly favorable evaluations. In this work, we present

high relevance survey

Page 1 of 22 Next