P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs
Poison-to-Poison (P2P), a general and effective backdoor defense algorithm. P2P injects benign triggers with safe alternative labels into a subset of training samples and fine-tunes the model
SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation
datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality
Byzantine-Robust Federated Learning Framework with Post-Quantum Secure Aggregation for Real-Time Threat Intelligence Sharing in Critical IoT Infrastructure
security suffer from two critical vulnerabilities: susceptibility to Byzantine attacks where malicious participants poison model updates, and inadequacy against future quantum computing threats that can compromise cryptographic aggregation protocols. This
On The Dangers of Poisoned LLMs In Security Automation
tuned Llama3.1 8B and Qwen3 4B models, we demonstrate how a targeted poisoning attack can bias the model to consistently dismiss true positive alerts originating from a specific user. Additionally
Structure-Aware Distributed Backdoor Attacks in Federated Learning
across different model architectures. This assumption overlooks the impact of model structure on perturbation effectiveness. From a structure-aware perspective, this paper analyzes the coupling relationship between model architectures
CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks
State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidden State Poisoning Attacks (HiSPAs), a recently
Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning
track how internal preferences between competing facts evolve across checkpoints, layers, and model scales. Even moderate poisoning (50-100%) flips over 55% of responses from correct to counterfactual while leaving
Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning
generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce
Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks
Model Context Protocol (MCP) enables Large Language Models to integrate external tools through structured descriptors, increasing autonomy in decision-making, task execution, and multi-agent workflows. However, this autonomy creates
HAMLOCK: HArdware-Model LOgically Combined attacK
networks (DNNs) introduces new security vulnerabilities. Conventional model-level backdoor attacks, which only poison a model's weights to misclassify inputs with a specific trigger, are often detectable because
TooBad: Backdoor Diffusion Models with Ultra-Low Poison Rate and Imperceptible Trigger
factors: attack performance, stealthiness, time complexity, and required poison rates. For example, achieving high attack performance typically demands a high poison rate and prolonged training, which undermines stealthiness, making
Hidden State Poisoning Attacks against Mamba-based Language Models
their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms
SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models
Modern language models remain vulnerable to backdoor attacks via poisoned data, where training inputs containing a trigger are paired with a target output, causing the model to reproduce that behavior
Adversarial Update-Based Federated Unlearning for Poisoned Model Recovery
Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the global model. Although detection methods can identify and remove malicious
Steering Vectors are an Adversarial Attack Surface
verify. We test the attack on two open-weight model families and eight model-attribute combinations, observing that poisoned vectors reach an absolute attack success rate
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models
Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models
environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot
Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models
model in which a subtle word (e.g., ``cool'') induces modality-aligned brand promotion or ideological influence in 55% of generations. Without model access, ToBAC can be induced through data poisoning
Hidden State Poisoning Attacks against Mamba-based Language Models
their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench-25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms
SoK: Unlearnability and Unlearning for Model Dememorization
Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible