Paper 2603.17174v1

Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning

generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce

high relevance attack
Paper 2512.06556v1

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Model Context Protocol (MCP) enables Large Language Models to integrate external tools through structured descriptors, increasing autonomy in decision-making, task execution, and multi-agent workflows. However, this autonomy creates

high relevance tool
Paper 2510.19145v4

HAMLOCK: HArdware-Model LOgically Combined attacK

networks (DNNs) introduces new security vulnerabilities. Conventional model-level backdoor attacks, which only poison a model's weights to misclassify inputs with a specific trigger, are often detectable because

high relevance attack
Paper 2601.01972v3

Hidden State Poisoning Attacks against Mamba-based Language Models

their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms

high relevance attack
Paper 2511.14301v3

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

Modern language models remain vulnerable to backdoor attacks via poisoned data, where training inputs containing a trigger are paired with a target output, causing the model to reproduce that behavior

high relevance attack
Paper 2605.02110v1

Adversarial Update-Based Federated Unlearning for Poisoned Model Recovery

Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the global model. Although detection methods can identify and remove malicious

medium relevance attack
Paper 2509.23041v2

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models

high relevance attack
Paper 2511.02894v3

Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models

environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot

medium relevance attack
Paper 2601.01972v4

Hidden State Poisoning Attacks against Mamba-based Language Models

their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench-25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms

high relevance attack
Paper 2511.12414v1

The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models

conduct a multi-scale analysis of this benign-label poisoning behavior across poison budget, total fine-tuning dataset size, and model size. A sharp threshold appears at small absolute budgets

medium relevance attack
Paper 2602.06616v1

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

present Confundo, a learning-to-poison framework that fine-tunes a large language model as a poison generator to achieve high effectiveness, robustness, and stealthiness. Confundo provides a unified framework

medium relevance benchmark
Paper 2604.21416v1

CSC: Turning the Adversary's Poison against Itself

compromise model utility through unlearning methods that lead to accuracy degradation. This paper conducts a comprehensive analysis of backdoor attack dynamics during model training, revealing that poisoned samples form isolated

medium relevance benchmark
Paper 2511.09105v1

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

preserving the intended poisoning effect. Empirical results demonstrate that this cost-minimization post-processing can significantly reduce poisoning costs over baselines, particularly when the reward model's feature dimension

high relevance attack
Paper 2602.22246v1

Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models

induced behaviors and restore normal functionality. Building on this, we purify the poisoned dataset using the compromised model itself, then fine-tune the model on the purified data to recover

medium relevance benchmark
Paper 2601.04448v1

Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models

backdoor attacks, where adversaries poison a small subset of data to implant hidden behaviors. Despite this growing risk, defenses for instruction-tuned models remain underexplored. We propose MB-Defense (Merging

medium relevance attack
Paper 2601.06305v1

Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models

large language models, but it is notably ineffective at removing backdoor behaviors from poisoned pretrained models when fine-tuning on clean dataset. Contrary to the common belief that this weakness

medium relevance benchmark
Paper 2512.23132v1

Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems

making them targets for data poisoning, model extraction, prompt injection, automated jailbreaking, and preference-guided black-box attacks that exploit model comparisons. Larger models can be more vulnerable to introspection

medium relevance tool
Paper 2603.24857v1

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data

medium relevance survey
Paper 2603.02262v1

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

poisoning attack targeting the reasoning process of medical LLMs during SFT. Unlike backdoor attacks, our method injects poisoned rationales into few-shot training data, leading to stealthy degradation of model

medium relevance attack
Paper 2605.04698v1

Gray-Box Poisoning of Continuous Malware Ingestion Pipelines

high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through

medium relevance attack
Previous Page 2 of 13 Next