Search: prompt injection | AI Threat Alert

429 results in 183ms

Paper 2606.24402v1

2026-06-23

Poisoned Playbooks: Demystifying Knowledge Poisoning Effects on AI Security Agents

challenges and AI agents. First, we demonstrate how a crafted single poisoned write-up injected into public-style security knowledge sources which we denote as Poisoned Playbooks, alters the behavior

medium relevance attack

Paper 2603.03332v2

2026-02-11

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning

medium relevance survey

Paper 2601.13300v1

2026-01-19

OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference

signals such as social cues, framing, and instructions. In this work, we introduce option injection, a benchmarking approach that augments the multiple-choice question answering (MCQA) interface with an additional

high relevance benchmark

Paper 2605.30096v1

2026-05-28

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Large language models (LLMs) can autonomously conduct multi-stage cyber

high relevance attack

Paper 2604.21860v1

2026-04-23

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial

high relevance attack

Paper 2604.21131v1

2026-04-22

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation

medium relevance benchmark

Paper 2605.17480v1

2026-05-17

The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure

domain-specific narratives and propagated to a Manager through Worker reports, without any syntactic injection primitives. Across 42,000 adversarial trials over 12 Manager models and 7 Worker configurations

medium relevance benchmark

Paper 2511.03675v1

2025-11-05

Whisper Leak: a side-channel attack on Large Language Models

paramount. This paper introduces Whisper Leak, a side-channel attack that infers user prompt topics from encrypted LLM traffic by analyzing packet size and timing patterns in streaming responses. Despite

high relevance attack

Paper 2510.00490v1

2025-10-01

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for

medium relevance attack

Previous Page 22 of 22