SafeClawBench: Separating Semantic, Audit-Evidence, and Sandbox Harm in Tool-Using LLM Agents
Tool-using language-model agents introduce security failures that go beyond unsafe text: they can disclose protected objects, write persistent memory, send messages, modify databases, or trigger harmful code
Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS
often fall short in practice. Most are tailored to specific attack types, require internal model access, or rely on static mechanisms that fail to generalize across evolving attack strategies. Furthermore
RobustMask: Certified Robustness against Adversarial Neural Ranking Attack via Randomized Masking
Neural ranking models have achieved remarkable progress and are now widely deployed in real-world applications such as Retrieval-Augmented Generation (RAG). However, like other neural architectures, they remain vulnerable
Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks
Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system
TinyGuard:A lightweight Byzantine Defense for Resource-Constrained Federated Learning via Statistical Update Fingerprints
label poisoning. Against adaptive white-box adversaries, Pareto frontier analysis across four orders of magnitude confirms that attackers cannot simultaneously evade detection and achieve effective poisoning, features we term statistical
DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation
increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely focusing on individual queries or narrow topic-local query
PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks
Systems (IDS) introduces new challenges related to data privacy, centralized coordination, and susceptibility to poisoning attacks. While significant research has focused on protecting traditional FL-IDS with centralized aggregation servers
Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey
Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful
Cybercrime and Computer Forensics in Epoch of Artificial Intelligence in India
while Machine Learning offers high accuracy in pattern recognition, it introduces vulnerabilities regarding data poisoning and algorithmic bias. Findings highlight a critical tension between the Act's data minimization principles
SMCP: Secure Model Context Protocol
large language models (LLMs) are moving away from closed, single-model frameworks and toward open ecosystems that connect a variety of agents, external tools, and resources. The Model Context Protocol
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models. However, as these teacher models may stem from untrusted parties, distillation
The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism
Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks labeled as prompt injection have evolved from isolated
Detection of adversarial intent in Human-AI teams using LLMs
Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval, programming, and decision-making assistance. While these agents' autonomy
Bias Injection Attacks on RAG Databases and Sanitization Defenses
defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects
Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents
Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving
Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation
with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt
MemLineage: Lineage-Guided Enforcement for LLM Agent Memory
ancestor, while still allowing benign recall. We evaluate three defense cells against three memory-poisoning workloads on a deterministic mechanism-isolation harness; MemLineage is the only configuration in that harness
SpectralGuard: Detecting Memory Collapse Attacks in State Space Models
State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius
Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents
Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence
Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense
mechanism. At inference, retrieved cells serve as grounding context for precise safety decisions. Across model-level safety on HarmBench and agent-level safety on AgentHarm, Membrane achieves the highest