Search: prompt injection | AI Threat Alert

Severity:

430 results in 172ms

Paper 2510.04528v1

2025-10-06

Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers

rapid adoption of large language models (LLMs) in enterprise systems exposes vulnerabilities to prompt injection attacks, strategic deception, and biased outputs, threatening security, trust, and fairness. Extending our adversarial activation

high relevance attack

Paper 2605.03378v1

2026-05-05

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

with tool use, skills, and external knowledge, has introduced new security risks. Among them, prompt injection attacks, where adversaries embed malicious instructions into the agent workflow, have emerged

high relevance attack

Paper 2604.11790v1

2026-04-13

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

vulnerability manifests across three primary attack channels: web and local content injection, MCP server injection, and skill file injection. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime

high relevance tool

Paper 2602.07104v1

2026-02-06

Extended to Reality: Prompt Injection in 3D Environments

objects in the environment to override MLLMs' intended task. While prior work has studied prompt injection in the text domain and through digitally edited 2D images, it remains unclear

high relevance attack

Paper 2603.03637v1

2026-03-04

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images

high relevance attack

Paper 2604.27202v1

2026-04-29

Indirect Prompt Injection in the Wild: An Empirical Study of Prevalence, Techniques, and Objectives

site owners, contributors, and adversaries to embed instructions directly in web resources, i.e., indirect prompt injections. While prior work demonstrates such attacks in controlled settings, their prevalence, deployment, and real

high relevance attack

Paper 2510.13543v1

2025-10-15

In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers

browsers) offer powerful automation of web tasks. However, they are vulnerable to indirect prompt injection attacks, where malicious instructions hidden in a webpage deceive the agent into unwanted actions. These

high relevance attack

Paper 2511.01634v2

2025-11-03

Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models

while powerful, also makes them vulnerable to a new class of attacks known as prompt injection. In these attacks, hidden or malicious instructions are inserted into user inputs or external

high relevance attack

Paper 2512.12594v2

2025-12-14

ceLLMate: Sandboxing Browser AI Agents

across pages. While these agents help automate repetitive online tasks, they are vulnerable to prompt injection attacks that trick an agent into performing undesired actions, such as leaking private information

medium relevance benchmark

Paper 2601.11199v1

2026-01-16

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

disclosing sensitive information; however, recent studies have also demonstrated that LLMs remain vulnerable to prompt injection attacks that can override intended behavioral constraints. For these reasons, we propose a novel

high relevance attack

Paper 2605.25415v1

2026-05-25

LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers

LLMs along three axes: rating calibration, divergence from human reviewers, and resistance to prompt injection embedded via an invisible font-mapping attack. We find that LLMs systematically overrate weaker submissions

high relevance survey

Paper 2510.05442v1

2025-10-06

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Search to complete complex tasks. However, this tool usage introduces the risk of indirect prompt injections, where malicious instructions hidden in tool outputs can manipulate the agent, posing security risks

medium relevance attack

Paper 2604.28157v1

2026-04-30

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats

high relevance attack

Paper 2509.22040v1

2025-09-26

"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors

raises new security concerns. In this study, we present the first empirical analysis of prompt injection attacks targeting these high-privilege agentic AI coding editors. We show how attackers

high relevance attack

Paper 2601.17548v1

2026-01-24

Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

this \textbf{Systematization of Knowledge (SoK)} paper, we present a comprehensive analysis of prompt injection attacks targeting agentic coding assistants. We propose a novel three-dimensional taxonomy categorizing attacks across

high relevance attack

Paper 2510.26328v1

2025-10-30

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced

high relevance attack

Paper 2606.22779v1

2026-06-22

DE-FIVE: Detecting Malicious Image Prompts via Fourier Features and Image Vector Embeddings

VLMs, making them more susceptible to security threats such as adversarial perturbations and indirect prompt injection, wherein crafted malicious image prompts can elicit unintended model outputs. Existing defense methods against

medium relevance benchmark

Paper 2603.10521v1

2026-03-11

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

resolving instruction conflicts. IH is key to defending against jailbreaks, system prompt extractions, and agentic prompt injections. However, robust IH behavior is difficult to train: IH failures can be confounded

medium relevance benchmark

Paper 2602.00750v1

2026-01-31

Bypassing Prompt Injection Detectors through Evasive Injections

vulnerable to task drift; deviations from a user's intended instruction due to injected secondary prompts. Recent work has shown that linear probes trained on activation deltas of LLMs' hidden

high relevance attack

Paper 2605.10176v1

2026-05-11

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

attack patterns. We evaluate the proposed framework under diverse and realistic attack scenarios, including prompt injection, obfuscated SQL payloads, and context-manipulation attacks. To ensure robustness, we generate and curate

high relevance attack

Previous Page 5 of 22 Next