Paper 2603.10521v1

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

resolving instruction conflicts. IH is key to defending against jailbreaks, system prompt extractions, and agentic prompt injections. However, robust IH behavior is difficult to train: IH failures can be confounded

medium relevance benchmark
Paper 2602.00750v1

Bypassing Prompt Injection Detectors through Evasive Injections

vulnerable to task drift; deviations from a user's intended instruction due to injected secondary prompts. Recent work has shown that linear probes trained on activation deltas of LLMs' hidden

high relevance attack
Paper 2605.10176v1

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

attack patterns. We evaluate the proposed framework under diverse and realistic attack scenarios, including prompt injection, obfuscated SQL payloads, and context-manipulation attacks. To ensure robustness, we generate and curate

high relevance attack
Paper 2512.08417v2

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents

high relevance attack
Paper 2602.16752v1

The Vulnerability of LLM Rankers to Prompt Injection Attacks

LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter

high relevance attack
Paper 2602.14161v1

When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Detecting prompt injection and jailbreak attacks is critical for deploying LLM-based agents safely. As agents increasingly process untrusted data from emails, documents, tool outputs, and external APIs, robust attack

medium relevance benchmark
Paper 2511.19727v1

Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts

present Prompt Fencing, a novel architectural approach that applies cryptographic authentication and data architecture principles to establish explicit security boundaries within LLM prompts. Our approach decorates prompt segments with cryptographically

medium relevance attack
Paper 2509.25448v2

Fingerprinting LLMs via Prompt Injection

prompts, which are not robust to post-processing. In this work, we propose LLMPrint, a novel detection framework that constructs fingerprints by exploiting LLMs' inherent vulnerability to prompt injection

high relevance attack
Paper 2606.09315v1

Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents

channel for tool-use agents, exposing a new attack surface we call \emph{brain-prompt injection}: signal-side perturbations, context-only injections, and adaptive dual-decoder attacks can all change

high relevance attack
Paper 2604.12232v1

TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs

elicit harmful outputs, poses significant security risks. While prior work has primarily focused on prompt injection attacks, these approaches often require resource-intensive prompt engineering and overlook other critical components

high relevance attack
Paper 2512.20405v2

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

that the review was generated by an LLM, not a human. This method turns prompt injections from vulnerability into a verification tool. We outline our design, expected model behaviors

medium relevance survey
Paper 2601.15528v1

Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform

tenant data access controls. In addition, the platform integrates practical, platform-level defences against prompt injection attacks in RAG-based chatbots, translating insights from recent prompt injection research into deployable

medium relevance tool
Paper 2603.17705v1

Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic Segmentation

representations while minimizing the number of trainable parameters. Specifically, we design a Cross-modal Prompt-Injected Adapter (CPIA) to enable deep semantic interaction by generating shared prompts and injecting them

medium relevance benchmark
Paper 2510.11238v1

Attacks by Content: Automated Fact-checking is an AI Security Issue

manipulate the data they receive to subvert their behaviour. Previous research has studied indirect prompt injection, where the attacker injects malicious instructions. We argue that injection of instructions

high relevance attack
Paper 2510.05244v1

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept

high relevance benchmark
Paper 2606.13044v1

No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions

infrastructure, most robustness concerns have focused on explicit attacks such as hidden instructions and prompt injection. We study a harder and more policy-relevant failure mode: no hidden text

medium relevance survey
Paper 2603.28013v1

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior work measures task-level attack success rate (ASR); we localize the pipeline stage at which

high relevance attack
Paper 2510.09023v1

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an attacker from eliciting harmful knowledge or remotely triggering malicious

high relevance attack
Paper 2605.28116v1

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Injection of Realistic Adversarial GUI Examples), a pipeline that turns benign mobile screenshots into prompt-injection samples by placing attacker-controlled text into ordinary user-generated content regions, without modifying

high relevance attack
Paper 2601.04666v1

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions

high relevance attack
Previous Page 6 of 23 Next