Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments
system, yet real deployments remain vulnerable to microarchitectural leakage, side-channel attacks, and fault injection. In parallel, security teams increasingly rely on Large Language Model (LLM) assistants as security advisors
ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation
Multimodal large language models (MLLMs) are increasingly used to automate
AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability
real. We present AdversaBench, an end-to-end red-teaming pipeline that mutates seed prompts with five structured operators, queries a target model, and confirms failures through a three-judge
Memory Poisoning Attack and Defense on Memory Based LLM-Agents
memory and influence future responses. Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate and 70 % attack success rate under idealized conditions. However
Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense
graph analyzer, traces data flow from file I/O to LLM context injection points and ranks carriers by context injection position without manual analysis. SRPO, our summary-resilient payload optimizer, generates
Conflict-Aware Retriever Editing for Knowledge Injection Attacks on LLM-Based RAG Systems
mislead downstream generation, posing a serious security threat for AI applications. Existing RAG injection attacks mainly rely on manipulating external knowledge bases, such as crafting malicious corpus. However, the synthetic
TFL: Targeted Bit-Flip Attack on Large Language Model
safety and security critical applications, raising concerns about their robustness to model parameter fault injection attacks. Recent studies have shown that bit-flip attacks (BFAs), which exploit computer main memory
Poisoned Playbooks: Demystifying Knowledge Poisoning Effects on AI Security Agents
challenges and AI agents. First, we demonstrate how a crafted single poisoned write-up injected into public-style security knowledge sources which we denote as Poisoned Playbooks, alters the behavior
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning
PraisonAI: Unauthenticated RCE via Jobs API + Approval Bypass
OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference
signals such as social cues, framing, and instructions. In this work, we introduce option injection, a benchmarking approach that augments the multiple-choice question answering (MCQA) interface with an additional
How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency
Large language models (LLMs) can autonomously conduct multi-stage cyber
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new multi-turn attack technique that systematically exploits stateless moderation by distributing adversarial
Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms
attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation
The Capability Paradox: How Smarter Auditors Make Multi-Agent Systems Less Secure
domain-specific narratives and propagated to a Manager through Worker reports, without any syntactic injection primitives. Across 42,000 adversarial trials over 12 Manager models and 7 Worker configurations
Whisper Leak: a side-channel attack on Large Language Models
paramount. This paper introduces Whisper Leak, a side-channel attack that infers user prompt topics from encrypted LLM traffic by analyzing packet size and timing patterns in streaming responses. Despite
Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models
Recently, Bit-Flip Attack (BFA) has garnered widespread attention for