Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks
Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system
TinyGuard:A lightweight Byzantine Defense for Resource-Constrained Federated Learning via Statistical Update Fingerprints
label poisoning. Against adaptive white-box adversaries, Pareto frontier analysis across four orders of magnitude confirms that attackers cannot simultaneously evade detection and achieve effective poisoning, features we term statistical
DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation
increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely focusing on individual queries or narrow topic-local query
PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks
Systems (IDS) introduces new challenges related to data privacy, centralized coordination, and susceptibility to poisoning attacks. While significant research has focused on protecting traditional FL-IDS with centralized aggregation servers
Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey
Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful
Cybercrime and Computer Forensics in Epoch of Artificial Intelligence in India
while Machine Learning offers high accuracy in pattern recognition, it introduces vulnerabilities regarding data poisoning and algorithmic bias. Findings highlight a critical tension between the Act's data minimization principles
SMCP: Secure Model Context Protocol
large language models (LLMs) are moving away from closed, single-model frameworks and toward open ecosystems that connect a variety of agents, external tools, and resources. The Model Context Protocol
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models. However, as these teacher models may stem from untrusted parties, distillation
The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism
Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks labeled as prompt injection have evolved from isolated
Detection of adversarial intent in Human-AI teams using LLMs
Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval, programming, and decision-making assistance. While these agents' autonomy
Bias Injection Attacks on RAG Databases and Sanitization Defenses
defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects
Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents
Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving
Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation
with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt
MemLineage: Lineage-Guided Enforcement for LLM Agent Memory
ancestor, while still allowing benign recall. We evaluate three defense cells against three memory-poisoning workloads on a deterministic mechanism-isolation harness; MemLineage is the only configuration in that harness
SpectralGuard: Detecting Memory Collapse Attacks in State Space Models
State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius
Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents
Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence
Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense
mechanism. At inference, retrieved cells serve as grounding context for precise safety decisions. Across model-level safety on HarmBench and agent-level safety on AgentHarm, Membrane achieves the highest
Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training
Machine-generated text (MGT) detection is critical for regulating online
CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World
Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are inherently limited in both capability
Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems
user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this