Paper 2602.01129v1

SMCP: Secure Model Context Protocol

large language models (LLMs) are moving away from closed, single-model frameworks and toward open ecosystems that connect a variety of agents, external tools, and resources. The Model Context Protocol

medium relevance attack
Paper 2510.18541v1

Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation

often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models. However, as these teacher models may stem from untrusted parties, distillation

medium relevance benchmark
Paper 2601.09625v2

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism

Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks labeled as prompt injection have evolved from isolated

high relevance attack
Paper 2603.20976v1

Detection of adversarial intent in Human-AI teams using LLMs

Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval, programming, and decision-making assistance. While these agents' autonomy

medium relevance attack
Paper 2512.00804v1

Bias Injection Attacks on RAG Databases and Sanitization Defenses

defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects

high relevance attack
Paper 2604.24020v1

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving

medium relevance survey
Paper 2603.05073v1

Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt

medium relevance benchmark
Paper 2605.14421v1

MemLineage: Lineage-Guided Enforcement for LLM Agent Memory

ancestor, while still allowing benign recall. We evaluate three defense cells against three memory-poisoning workloads on a deterministic mechanism-isolation harness; MemLineage is the only configuration in that harness

low relevance attack
Paper 2603.12414v1

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius

high relevance attack
Paper 2602.07652v1

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence

medium relevance benchmark
Paper 2606.05743v1

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense

mechanism. At inference, retrieved cells serve as grounding context for precise safety decisions. Across model-level safety on HarmBench and agent-level safety on AgentHarm, Membrane achieves the highest

medium relevance defense
Paper 2605.02374v1

Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training

Machine-generated text (MGT) detection is critical for regulating online

medium relevance attack
Paper 2512.14158v1

CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World

Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are inherently limited in both capability

high relevance attack
Paper 2601.07072v1

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this

high relevance tool
Paper 2605.02372v1

Privacy Preserving Machine Learning Workflow: from Anonymization to Personalized Differential Privacy Budgets in Federated Learning

called privacy preserving machine learning architectures, such as federated learning. While federated learning enables model training on decentralized data preventing their sharing and centralization, it still faces several challenges related

medium relevance benchmark
Paper 2603.07379v1

SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions

Retrieval-Augmented Generation (RAG) systems are increasingly evolving into agentic architectures where large language models autonomously coordinate multi-step reasoning, dynamic memory management, and iterative retrieval strategies. Despite rapid industrial

low relevance survey
Paper 2606.12797v1

The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements

Agentic large language model systems that autonomously invoke tools, maintain persistent memory, and execute multi-step plans are increasingly deployed in public-facing domains, including government services, healthcare triage

medium relevance tool
Paper 2601.05467v3

STELP: Secure Transpilation and Execution of LLM-Generated Programs

Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs place them at the center

medium relevance survey
Paper 2511.18921v1

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

hijack. Each category captures a distinct pathway through which an adversary can manipulate a model's behavior. We evaluate these threats using 12 representative attack methods spanning text, image

high relevance benchmark
Paper 2602.22134v2

Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions

SemCom via AI defense. We analyze AI-centric threat models by consolidating existing studies and organizing attack surfaces across model-level, channel-realizable, knowledge-based, and networked inference vectors. Building

medium relevance defense
Previous Page 13 of 15 Next