Paper 2601.04266v1

State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space

Vision-Language-Action (VLA) models are widely deployed in safety

high relevance attack
Paper 2603.04859v1

Osmosis Distillation: Model Hijacking with the Fewest Samples

generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose

medium relevance benchmark
Paper 2605.01782v1

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

evidence, but it also opens a data-layer attack surface: poisoned corpus entries can steer outputs without changing model parameters. Existing defenses and traceback methods are largely passage-level, which

medium relevance benchmark
Paper 2604.21477v1

MCP Pitfall Lab: Exposing Developer Pitfalls in MCP Tool Server Security under Multi-Vector Attacks

email, document, crypto) with six server variants (baseline and hardened) and model three attack families: tool-metadata poisoning, puppet servers, and multimodal image-to-tool chains, in a unified, trace

high relevance tool
Paper 2603.12989v1

Test-Time Attention Purification for Backdoored Large Vision Language Models

defenses across diverse datasets and backdoor attack types, while preserving the model's utility on both clean and poisoned samples

medium relevance benchmark
Paper 2509.26032v2

Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification

semantic deviation caused by label flipping, both of which make poisoned graphs easily detectable by anomaly detection models. To address this, we propose DPSBA, a clean-label backdoor framework that

high relevance attack
Paper 2603.18034v1

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

documents are preferentially retrieved at inference time, enabling targeted manipulation of model outputs. We study gradient-guided corpus poisoning attacks against modern RAG pipelines and evaluate retrieval-layer defenses that

high relevance attack
Paper 2602.11213v1

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based

high relevance attack
Paper 2604.06833v1

FedDetox: Robust Federated SLM Alignment via On-Device Data Sanitization

critical issue we define as unintended data poisoning, which can severely damage the safety alignment of global models during federated alignment. To address this, we propose FedDetox, a robust framework

low relevance defense
Paper 2604.27434v1

AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning

data. However, FL's decentralized nature makes it vulnerable to poisoning attacks, where malicious clients can submit corrupted models to manipulate the system. To counter such attacks, although various Byzantine

medium relevance benchmark
Paper 2602.06532v1

Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection

models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and reliability. Recent studies have demonstrated that ML models are vulnerable

medium relevance defense
Paper 2605.05632v1

Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning

models - remain untested against adversarially optimized contradictions. We evaluate four RAG architectures (vanilla RAG, agentic RAG, MADAM-RAG, and Recursive Language Models) under controlled single-document (N=1) poisoning

medium relevance attack
Paper 2509.19921v2

On the Fragility of Contribution Score Computation in Federated Learning

alter the final scores. Second, we explore vulnerabilities posed by poisoning attacks, where malicious participants strategically manipulate their model updates to inflate their own contribution scores or reduce the importance

medium relevance benchmark
Paper 2606.09499v1

Targeting World Models to Compromise Robot Learning Pipelines

While highly practical, in this work we demonstrate that world models introduce a uniquely stealthy and effective data poisoning entry point into the robot learning supply chain that can result

medium relevance benchmark
Paper 2603.01019v1

BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models

backdoor attack targeting the representation layer of self-supervised diffusion models. Specifically, it hijacks the semantic representations of poisoned samples with triggers in Principal Component Analysis (PCA) space toward those

high relevance attack
Paper 2601.05504v2

Memory Poisoning Attack and Defense on Memory Based LLM-Agents

Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory

high relevance attack
Paper 2509.21761v2

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Fine-tuned Large Language Models (LLMs) are vulnerable to backdoor attacks through data poisoning, yet the internal mechanisms governing these attacks remain a black box. Previous research on interpretability

medium relevance attack

Open WebUI's process_files_batch() endpoint missing ownership check

CVSS 7.1 open-webui View details
Paper 2604.04289v1

Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6

poisoned identifier names in the string table survive into the model's reconstructed code, even when the model demonstrably understands the correct semantics? Using Claude Opus 4.6 across 192 inference

low relevance other
Paper 2604.07403v1

RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement

Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs), but simultaneously exposes a critical vulnerability to knowledge poisoning attacks. Existing attack methods like PoisonedRAG remain detectable due to coarse

high relevance attack
Previous Page 6 of 16 Next