Search: model poisoning | AI Threat Alert

Severity:

297 results in 146ms

Paper 2604.27238v1

2026-04-29

SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation

datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality

medium relevance attack

Paper 2601.01053v1

2026-01-03

Byzantine-Robust Federated Learning Framework with Post-Quantum Secure Aggregation for Real-Time Threat Intelligence Sharing in Critical IoT Infrastructure

security suffer from two critical vulnerabilities: susceptibility to Byzantine attacks where malicious participants poison model updates, and inadequacy against future quantum computing threats that can compromise cryptographic aggregation protocols. This

medium relevance benchmark

Paper 2511.02600v1

2025-11-04

On The Dangers of Poisoned LLMs In Security Automation

tuned Llama3.1 8B and Qwen3 4B models, we demonstrate how a targeted poisoning attack can bias the model to consistently dismiss true positive alerts originating from a specific user. Additionally

medium relevance benchmark

Paper 2603.03865v1

2026-03-04

Structure-Aware Distributed Backdoor Attacks in Federated Learning

across different model architectures. This assumption overlooks the impact of model structure on perturbation effectiveness. From a structure-aware perspective, this paper analyzes the coupling relationship between model architectures

high relevance attack

Paper 2603.12206v1

2026-03-12

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidden State Poisoning Attacks (HiSPAs), a recently

high relevance attack

Paper 2510.26829v3

2025-10-29

Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning

track how internal preferences between competing facts evolve across checkpoints, layers, and model scales. Even moderate poisoning (50-100%) flips over 55% of responses from correct to counterfactual while leaving

low relevance attack

Paper 2603.17174v1

2026-03-17

Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning

generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce

high relevance attack

Paper 2512.06556v1

2025-12-06

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Model Context Protocol (MCP) enables Large Language Models to integrate external tools through structured descriptors, increasing autonomy in decision-making, task execution, and multi-agent workflows. However, this autonomy creates

high relevance tool

Paper 2510.19145v4

2025-10-22

HAMLOCK: HArdware-Model LOgically Combined attacK

networks (DNNs) introduces new security vulnerabilities. Conventional model-level backdoor attacks, which only poison a model's weights to misclassify inputs with a specific trigger, are often detectable because

high relevance attack

Paper 2606.23362v1

2026-06-22

TooBad: Backdoor Diffusion Models with Ultra-Low Poison Rate and Imperceptible Trigger

factors: attack performance, stealthiness, time complexity, and required poison rates. For example, achieving high attack performance typically demands a high poison rate and prolonged training, which undermines stealthiness, making

medium relevance benchmark

Paper 2601.01972v3

2026-01-05

Hidden State Poisoning Attacks against Mamba-based Language Models

their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms

high relevance attack

Paper 2511.14301v3

2025-11-18

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

Modern language models remain vulnerable to backdoor attacks via poisoned data, where training inputs containing a trigger are paired with a target output, causing the model to reproduce that behavior

high relevance attack

Paper 2605.02110v1

2026-05-04

Adversarial Update-Based Federated Unlearning for Poisoned Model Recovery

Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the global model. Although detection methods can identify and remove malicious

medium relevance attack

Paper 2606.05958v1

2026-06-04

Steering Vectors are an Adversarial Attack Surface

verify. We test the attack on two open-weight model families and eight model-attribute combinations, observing that poisoned vectors reach an absolute attack success rate

high relevance attack

Paper 2509.23041v2

2025-09-27

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models

high relevance attack

Paper 2511.02894v3

2025-11-04

Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models

environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot

medium relevance attack

Paper 2605.19227v1

2026-05-19

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

model in which a subtle word (e.g., ``cool'') induces modality-aligned brand promotion or ideological influence in 55% of generations. Without model access, ToBAC can be induced through data poisoning

medium relevance attack

Paper 2601.01972v4

2026-01-05

Hidden State Poisoning Attacks against Mamba-based Language Models

their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench-25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms

high relevance attack

Paper 2605.11592v1

2026-05-12

SoK: Unlearnability and Unlearning for Model Dememorization

Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible

low relevance survey

Paper 2511.12414v1

2025-11-16

The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models

conduct a multi-scale analysis of this benign-label poisoning behavior across poison budget, total fine-tuning dataset size, and model size. A sharp threshold appears at small absolute budgets

medium relevance attack

Previous Page 2 of 15 Next