SMCP: Secure Model Context Protocol
large language models (LLMs) are moving away from closed, single-model frameworks and toward open ecosystems that connect a variety of agents, external tools, and resources. The Model Context Protocol
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models. However, as these teacher models may stem from untrusted parties, distillation
The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism
Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks labeled as prompt injection have evolved from isolated
Detection of adversarial intent in Human-AI teams using LLMs
Large language models (LLMs) are increasingly deployed in human-AI teams as support agents for complex tasks such as information retrieval, programming, and decision-making assistance. While these agents' autonomy
Bias Injection Attacks on RAG Databases and Sanitization Defenses
defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects
Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents
Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering, yet existing defences address only the platform perimeter, leaving
Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation
with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt
MemLineage: Lineage-Guided Enforcement for LLM Agent Memory
ancestor, while still allowing benign recall. We evaluate three defense cells against three memory-poisoning workloads on a deterministic mechanism-isolation harness; MemLineage is the only configuration in that harness
SpectralGuard: Detecting Memory Collapse Attacks in State Space Models
State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius
Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents
Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence
Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense
mechanism. At inference, retrieved cells serve as grounding context for precise safety decisions. Across model-level safety on HarmBench and agent-level safety on AgentHarm, Membrane achieves the highest
Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training
Machine-generated text (MGT) detection is critical for regulating online
CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World
Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are inherently limited in both capability
Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems
user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this
Privacy Preserving Machine Learning Workflow: from Anonymization to Personalized Differential Privacy Budgets in Federated Learning
called privacy preserving machine learning architectures, such as federated learning. While federated learning enables model training on decentralized data preventing their sharing and centralization, it still faces several challenges related
SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions
Retrieval-Augmented Generation (RAG) systems are increasingly evolving into agentic architectures where large language models autonomously coordinate multi-step reasoning, dynamic memory management, and iterative retrieval strategies. Despite rapid industrial
The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements
Agentic large language model systems that autonomously invoke tools, maintain persistent memory, and execute multi-step plans are increasingly deployed in public-facing domains, including government services, healthcare triage
STELP: Secure Transpilation and Execution of LLM-Generated Programs
Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs place them at the center
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
hijack. Each category captures a distinct pathway through which an adversary can manipulate a model's behavior. We evaluate these threats using 12 representative attack methods spanning text, image
Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions
SemCom via AI defense. We analyze AI-centric threat models by consolidating existing studies and organizing attack surfaces across model-level, channel-realizable, knowledge-based, and networked inference vectors. Building