Poisoned Playbooks: Demystifying Knowledge Poisoning Effects on AI Security Agents
challenges and AI agents. First, we demonstrate how a crafted single poisoned write-up injected into public-style security knowledge sources which we denote as Poisoned Playbooks, alters the behavior
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning
OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference
signals such as social cues, framing, and instructions. In this work, we introduce option injection, a benchmarking approach that augments the multiple-choice question answering (MCQA) interface with an additional
How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency
Large language models (LLMs) can autonomously conduct multi-stage cyber
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial
Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms
attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation
The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure
domain-specific narratives and propagated to a Manager through Worker reports, without any syntactic injection primitives. Across 42,000 adversarial trials over 12 Manager models and 7 Worker configurations
Whisper Leak: a side-channel attack on Large Language Models
paramount. This paper introduces Whisper Leak, a side-channel attack that infers user prompt topics from encrypted LLM traffic by analyzing packet size and timing patterns in streaming responses. Despite
Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models
Recently, Bit-Flip Attack (BFA) has garnered widespread attention for