IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection
HTML pages those domains serve. Existing red-teaming resources fall short of this scenario: prompt-injection benchmarks ship pre-built adversarial pages that whitelisted agents cannot reach, and generic
Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection
Prompt injection remains one of the most practical attack vectors against LLM-integrated applications. We replicate the Microsoft LLMail-Inject benchmark (Greshake et al., 2024) against current generation models running
JSONalyzeQueryEngine` in the run-llama/llama_index repository allows for SQL injection via prompt injection. This can lead to arbitrary file creation and Denial-of-Service (DoS) attacks. The vulnerability affects
server CORS wildcard + auth-off-by-default enables CSRF graph exfiltration and persistent indirect prompt injection
Flowise: APIChain Prompt Injection SSRF in GET/POST API Chains
A Framework for Formalizing LLM Agent Security
executes a user task. Using this framework, we reformalize existing attacks, such as indirect prompt injection, direct prompt injection, jailbreak, task drift, and memory poisoning, as violations
AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks
Prompt injection attacks insert malicious instructions into an LLM's input to steer it toward an attacker-chosen task instead of the intended one. Existing detection defenses typically classify
Securing AI Agents Against Prompt Injection Attacks
used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled
Gradient-Controlled Decoding: A Safety Guardrail for LLMs with Dual-Anchor Steering
Large language models (LLMs) remain susceptible to jailbreak and direct prompt-injection attacks, yet the strongest defensive filters frequently over-refuse benign queries and degrade user experience. Previous work
Confidently Wrong: Severity-Aware Calibration of Prompt-Injection Detectors under Attack Shift
Prompt-injection detectors are deployed as guards: a model scores an input and a downstream system trusts or blocks it on that score. I study the confidence of these scores
The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection
which prompt injections embedded in retrieved documents backfire against the attacker, suppressing the target brand below the injection-free baseline. In safety-trained Claude models, documents containing prompt injections suffer
Large Language Models for Cyber Security
paper studies the architecture and functioning of LLMs, its integration into Encrypted prompts to prevent prompt injection attacks. It also studies the integration of LLMs into cybersecurity tools using
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject
Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection
Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer classifiers. Both share failure modes that recent work has made concrete
PraisonAIAgents: Environment Variable Secret Exfiltration via os.path.expandvars() Bypassing shell=False
Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching
Prompt injection remains a central obstacle to the safe deployment of large language models, particularly in multi-agent settings where intermediate outputs can propagate or amplify malicious instructions. Building
Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems
models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing
Cordyceps: Covert Control Attacks on LLMs via Data Poisoning
covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses. With a small poisoned fraction, covert control attacks outperform heuristic-based prompt injection
Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents
untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing