Paper 2606.18356v1

SafeClawBench: Separating Semantic, Audit-Evidence, and Sandbox Harm in Tool-Using LLM Agents

Tool-using language-model agents introduce security failures that go beyond unsafe text: they can disclose protected objects, write persistent memory, send messages, modify databases, or trigger harmful code

medium relevance tool
Paper 2512.13501v1

Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS

often fall short in practice. Most are tailored to specific attack types, require internal model access, or rely on static mechanisms that fail to generalize across evolving attack strategies. Furthermore

high relevance attack
Paper 2512.23307v1

RobustMask: Certified Robustness against Adversarial Neural Ranking Attack via Randomized Masking

Neural ranking models have achieved remarkable progress and are now widely deployed in real-world applications such as Retrieval-Augmented Generation (RAG). However, like other neural architectures, they remain vulnerable

high relevance attack
Paper 2603.21654v1

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system

medium relevance survey
Paper 2602.02615v1

TinyGuard:A lightweight Byzantine Defense for Resource-Constrained Federated Learning via Statistical Update Fingerprints

label poisoning. Against adaptive white-box adversaries, Pareto frontier analysis across four orders of magnitude confirms that attackers cannot simultaneously evade detection and achieve effective poisoning, features we term statistical

medium relevance defense
Paper 2606.01212v1

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely focusing on individual queries or narrow topic-local query

high relevance attack
Paper 2602.17973v1

PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks

Systems (IDS) introduces new challenges related to data privacy, centralized coordination, and susceptibility to poisoning attacks. While significant research has focused on protecting traditional FL-IDS with centralized aggregation servers

high relevance tool
Paper 2603.27918v1

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful

high relevance survey
Paper 2512.15799v1

Cybercrime and Computer Forensics in Epoch of Artificial Intelligence in India

while Machine Learning offers high accuracy in pattern recognition, it introduces vulnerabilities regarding data poisoning and algorithmic bias. Findings highlight a critical tension between the Act's data minimization principles

low relevance benchmark
Paper 2602.01129v1

SMCP: Secure Model Context Protocol

large language models (LLMs) are moving away from closed, single-model frameworks and toward open ecosystems that connect a variety of agents, external tools, and resources. The Model Context Protocol

medium relevance attack
Paper 2510.18541v1

Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation

often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models. However, as these teacher models may stem from untrusted parties, distillation

medium relevance benchmark
Paper 2601.09625v2

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks labeled as prompt injection have evolved from isolated

high relevance attack
Paper 2603.20976v1

Detection of adversarial intent in Human-AI teams using LLMs

Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval, programming, and decision-making assistance. While these agents' autonomy

medium relevance attack
Paper 2512.00804v1

Bias Injection Attacks on RAG Databases and Sanitization Defenses

defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects

high relevance attack
Paper 2604.24020v1

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving

medium relevance survey
Paper 2603.05073v1

Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt

medium relevance benchmark
Paper 2605.14421v1

MemLineage: Lineage-Guided Enforcement for LLM Agent Memory

ancestor, while still allowing benign recall. We evaluate three defense cells against three memory-poisoning workloads on a deterministic mechanism-isolation harness; MemLineage is the only configuration in that harness

low relevance attack
Paper 2603.12414v1

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius

high relevance attack
Paper 2602.07652v1

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence

medium relevance benchmark
Paper 2606.05743v1

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

mechanism. At inference, retrieved cells serve as grounding context for precise safety decisions. Across model-level safety on HarmBench and agent-level safety on AgentHarm, Membrane achieves the highest

medium relevance defense
Previous Page 13 of 16 Next