277 results in 11ms
Paper 2510.03705v1

Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods

data content, such as web pages from search engines, the LLMs are vulnerable to prompt injection attacks. These attacks trick the LLMs into deviating from the original input instruction

high relevance attack
Paper 2510.19844v1

CourtGuard: A Local, Multiagent Prompt Injection Classifier

system, where a "defense attorney" model argues the prompt is benign, a "prosecution attorney" model argues the prompt is a prompt injection, and a "judge" model gives the final classification

high relevance attack
Paper 2510.12252v2

PromptLocate: Localizing Prompt Injection Attacks

segments, (2) identifying segments contaminated by injected instructions, and (3) pinpointing segments contaminated by injected data. We show PromptLocate accurately localizes injected prompts across eight existing and eight adaptive attacks

high relevance attack
Paper 2603.13026v1

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

based red-teaming framework that systematically assesses existing prompt-injection defenses by training an attack LLM to optimize injected prompts in a practical black-box setting, where the attacker

high relevance attack
Paper 2603.12277v2

Prompt Injection as Role Confusion

reveal why prompt injection works: untrusted text that imitates a role inherits that role's authority. We test this insight by injecting spoofed reasoning into user prompts and tool outputs

high relevance attack
Paper 2603.12277v1

Prompt Injection as Role Confusion

reveal why prompt injection works: untrusted text that imitates a role inherits that role's authority. We test this insight by injecting spoofed reasoning into user prompts and tool outputs

high relevance attack
Paper 2511.10720v1

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

response, thereby eliminating the influence of the injected instruction. To sanitize injected tokens, PISanitizer builds on two observations: (1) prompt injection attacks essentially craft an instruction that compels

high relevance attack
Paper 2601.09625v2

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

malware execution mechanism triggered through prompts engineered to exploit an application's LLM. We introduce a seven-stage promptware kill chain: Initial Access (prompt injection), Privilege Escalation (jailbreaking), Reconnaissance, Persistence

high relevance attack
Paper 2603.21642v1

Are AI-assisted Development Tools Immune to Prompt Injection?

Prompt injection is listed as the number-one vulnerability class in the OWASP Top 10 for LLM Applications that can subvert LLM guardrails, disclose sensitive data, and trigger unauthorized tool

high relevance tool
Paper 2510.04885v1

RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection

Prompt injection poses a serious threat to the reliability and safety of LLM agents. Recent defenses against prompt injection, such as Instruction Hierarchy and SecAlign, have shown notable robustness against

high relevance attack
Paper 2603.11331v1

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

size-biased clusters is designated unsafe. Within this framework, we analyze prompt injection-based jailbreaking. Short injected prompts correspond to a weak magnetic field aligned towards unsafe cluster centers

high relevance attack
Paper 2510.14005v3

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

injection attacks, where an attacker contaminates the input to inject malicious instructions, causing the LLM to follow the attacker's intent instead of the original user's. Existing prompt injection

high relevance attack
Paper 2510.04257v1

AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to prompt injection, especially through visual inputs. We introduce AgentTypo, a black-box red-teaming framework that

high relevance attack
Paper 2602.05746v1

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Prompt injection is one of the most critical vulnerabilities in LLM agents; yet, effective automated attacks remain largely unexplored from an optimization perspective. Existing methods heavily depend on human

high relevance attack
Paper 2601.12359v1

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

manipulations inherent to real-world injection attacks. To ensure robust evaluation, we assemble and re-annotate the comprehensive LLMail-Inject dataset spanning five injection categories derived from publicly available sources

high relevance attack
Paper 2511.01287v1

"Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers

reviewing scientific papers. However, recent reports have revealed that some papers contain hidden, injected prompts designed to manipulate AI reviewers into providing overly favorable evaluations. In this work, we present

high relevance survey
Paper 2603.11875v1

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs

high relevance attack
Paper 2602.06268v1

MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs

LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt injection attacks can steer these systems toward clinically unsafe or misleading outputs. We introduce

high relevance benchmark
Paper 2511.05797v1

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

Prompt injection attacks pose a critical threat to large language models (LLMs), with prior work focusing on cutting-edge LLM applications like personal copilots. In contrast, simpler LLM applications, such

high relevance attack
Paper 2512.12583v1

Detecting Prompt Injection Attacks Against Application Using Classifiers

networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems

high relevance attack
Page 1 of 14 Next