MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks
email, document, crypto) with six server variants (baseline and hardened) and model three attack families: tool-metadata poisoning, puppet servers, and multimodal image-to-tool chains, in a unified, trace
Test-Time Attention Purification for Backdoored Large Vision Language Models
defenses across diverse datasets and backdoor attack types, while preserving the model's utility on both clean and poisoned samples
Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification
semantic deviation caused by label flipping, both of which make poisoned graphs easily detectable by anomaly detection models. To address this, we propose DPSBA, a clean-label backdoor framework that
Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
documents are preferentially retrieved at inference time, enabling targeted manipulation of model outputs. We study gradient-guided corpus poisoning attacks against modern RAG pipelines and evaluate retrieval-layer defenses that
Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation
software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based
FedDetox: Robust Federated SLM Alignment via On-Device Data Sanitization
critical issue we define as unintended data poisoning, which can severely damage the safety alignment of global models during federated alignment. To address this, we propose FedDetox, a robust framework
AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning
data. However, FL's decentralized nature makes it vulnerable to poisoning attacks, where malicious clients can submit corrupted models to manipulate the system. To counter such attacks, although various Byzantine
Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection
models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and reliability. Recent studies have demonstrated that ML models are vulnerable
Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning
models - remain untested against adversarially optimized contradictions. We evaluate four RAG architectures (vanilla RAG, agentic RAG, MADAM-RAG, and Recursive Language Models) under controlled single-document (N=1) poisoning
On the Fragility of Contribution Score Computation in Federated Learning
alter the final scores. Second, we explore vulnerabilities posed by poisoning attacks, where malicious participants strategically manipulate their model updates to inflate their own contribution scores or reduce the importance
Targeting World Models to Compromise Robot Learning Pipelines
While highly practical, in this work we demonstrate that world models introduce a uniquely stealthy and effective data poisoning entry point into the robot learning supply chain that can result
BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models
backdoor attack targeting the representation layer of self-supervised diffusion models. Specifically, it hijacks the semantic representations of poisoned samples with triggers in Principal Component Analysis (PCA) space toward those
Memory Poisoning Attack and Defense on Memory Based LLM-Agents
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
Fine-tuned Large Language Models (LLMs) are vulnerable to backdoor attacks through data poisoning, yet the internal mechanisms governing these attacks remain a black box. Previous research on interpretability
Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6
poisoned identifier names in the string table survive into the model's reconstructed code, even when the model demonstrably understands the correct semantics? Using Claude Opus 4.6 across 192 inference
RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement
Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs), but simultaneously exposes a critical vulnerability to knowledge poisoning attacks. Existing attack methods like PoisonedRAG remain detectable due to coarse
Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control
Confundo-style poisoning where adversarially optimized documents manipulate generated outputs. Existing defenses assume that detecting poisoned evidence prevents harm. We show this assumption is incorrect: models exhibit a monitoring-control
Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning
data-level defences are insufficient for stopping sophisticated data poisoning attacks. We suggest that future work should focus on model audits and white-box security methods
TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation
injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing
Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials
patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent