Stealthy Poisoning Attacks Bypass Defenses in Regression Settings
natural and physical sciences, yet their robustness to poisoning has received less attention. When it has, studies often assume unrealistic threat models and are thus less useful in practice
State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space
Vision-Language-Action (VLA) models are widely deployed in safety
Osmosis Distillation: Model Hijacking with the Fewest Samples
generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose
Test-Time Attention Purification for Backdoored Large Vision Language Models
defenses across diverse datasets and backdoor attack types, while preserving the model's utility on both clean and poisoned samples
Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification
semantic deviation caused by label flipping, both of which make poisoned graphs easily detectable by anomaly detection models. To address this, we propose DPSBA, a clean-label backdoor framework that
Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
documents are preferentially retrieved at inference time, enabling targeted manipulation of model outputs. We study gradient-guided corpus poisoning attacks against modern RAG pipelines and evaluate retrieval-layer defenses that
Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation
software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based
Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection
models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and reliability. Recent studies have demonstrated that ML models are vulnerable
On the Fragility of Contribution Score Computation in Federated Learning
alter the final scores. Second, we explore vulnerabilities posed by poisoning attacks, where malicious participants strategically manipulate their model updates to inflate their own contribution scores or reduce the importance
BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models
backdoor attack targeting the representation layer of self-supervised diffusion models. Specifically, it hijacks the semantic representations of poisoned samples with triggers in Principal Component Analysis (PCA) space toward those
Memory Poisoning Attack and Defense on Memory Based LLM-Agents
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
Fine-tuned Large Language Models (LLMs) are vulnerable to backdoor attacks through data poisoning, yet the internal mechanisms governing these attacks remain a black box. Previous research on interpretability
Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning
data-level defences are insufficient for stopping sophisticated data poisoning attacks. We suggest that future work should focus on model audits and white-box security methods
Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials
patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent
CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents
four major types of adversarial attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-based Backdoor. We evaluate six foundation models across two representative code interpreter agents (OpenInterpreter and OpenCodeInterpreter), incorporating
BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron
converts input data into spikes following the Leaky Integrate-and-Fire (LIF) neuron model. This model includes several important hyperparameters, such as the membrane potential threshold and membrane time constant
Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework
software components. Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail
SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often
Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation
Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into a poisoned class. Existing defenses often attempt
Weight space Detection of Backdoors in LoRA Adapters
trigger for backdoor behavior is unknown. We detect poisoned adapters by analyzing their weight matrices directly, without running the model -- making our method data-agnostic. Our method extracts simple statistics