427 results in 68ms
Paper 2603.19469v1

A Framework for Formalizing LLM Agent Security

executes a user task. Using this framework, we reformalize existing attacks, such as indirect prompt injection, direct prompt injection, jailbreak, task drift, and memory poisoning, as violations

medium relevance tool
Paper 2602.13597v2

AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks

Prompt injection attacks insert malicious instructions into an LLM's input to steer it toward an attacker-chosen task instead of the intended one. Existing detection defenses typically classify

high relevance attack
Paper 2511.15759v1

Securing AI Agents Against Prompt Injection Attacks

used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled

high relevance attack
Paper 2604.05179v1

Gradient-Controlled Decoding: A Safety Guardrail for LLMs with Dual-Anchor Steering

Large language models (LLMs) remain susceptible to jailbreak and direct prompt-injection attacks, yet the strongest defensive filters frequently over-refuse benign queries and degrade user experience. Previous work

medium relevance defense
Paper 2606.22659v1

Confidently Wrong: Severity-Aware Calibration of Prompt-Injection Detectors under Attack Shift

Prompt-injection detectors are deployed as guards: a model scores an input and a downstream system trusts or blocks it on that score. I study the confidence of these scores

high relevance attack
Paper 2606.09204v1

The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection

which prompt injections embedded in retrieved documents backfire against the attacker, suppressing the target brand below the injection-free baseline. In safety-trained Claude models, documents containing prompt injections suffer

high relevance attack
Paper 2511.04508v1

Large Language Models for Cyber Security

paper studies the architecture and functioning of LLMs, its integration into Encrypted prompts to prevent prompt injection attacks. It also studies the integration of LLMs into cybersecurity tools using

medium relevance attack
Paper 2509.22830v2

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret

high relevance attack
Paper 2602.20156v3

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject

high relevance attack
Paper 2604.18248v1

Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection

Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer classifiers. Both share failure modes that recent work has made concrete

high relevance attack
Paper 2601.13186v1

Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Prompt injection remains a central obstacle to the safe deployment of large language models, particularly in multi-agent settings where intermediate outputs can propagate or amplify malicious instructions. Building

high relevance attack
Paper 2603.18433v1

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing

high relevance tool
Paper 2605.26595v1

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses. With a small poisoned fraction, covert control attacks outperform heuristic-based prompt injection

high relevance attack
Paper 2606.13385v1

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing

high relevance benchmark
Paper 2603.30016v1

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper

high relevance tool
Paper 2605.25194v1

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

Adversarial images pose a severe security threat to multimodal large language models through prompt injection. Existing defenses largely lack a principled understanding of the underlying mechanisms and struggle to balance

high relevance attack
Paper 2510.05244v2

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept

high relevance benchmark
Paper 2603.17639v1

VeriGrey: Greybox Agent Validation

behavior. As mutation operators in the testing process, we mutate prompts to design pernicious injection prompts. This is carefully accomplished by linking the task of the agent to an injection

medium relevance benchmark
Paper 2601.04795v1

Defense Against Indirect Prompt Injection via Tool Result Parsing

malicious instructions via prompt engineering. Despite their flexibility, most current prompt-based defenses suffer from high Attack Success Rates (ASR), demonstrating limited robustness against sophisticated injection attacks. In this paper

high relevance tool
Paper 2510.04528v1

Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers

rapid adoption of large language models (LLMs) in enterprise systems exposes vulnerabilities to prompt injection attacks, strategic deception, and biased outputs, threatening security, trust, and fairness. Extending our adversarial activation

high relevance attack
Previous Page 4 of 22 Next