Search: model poisoning | AI Threat Alert

Severity:

250 results in 92ms

Paper 2605.04698v1

2026-05-06

Gray-Box Poisoning of Continuous Malware Ingestion Pipelines

high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through

medium relevance attack

Paper 2510.05169v1

2025-10-05

From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs

triggers responsible for misaligned outputs. Guided by curated reward signals, this process transforms a poisoned model into one capable of precisely identifying its implanted trigger. Surprisingly, we observe that such

medium relevance attack

Paper 2509.22873v2

2025-09-26

AntiFLipper: A Secure and Efficient Defense Against Label-Flipping Attacks in Federated Learning

remains vulnerable to label-flipping attacks, where malicious clients manipulate labels to poison the global model. Despite their simplicity, these attacks can severely degrade model performance, and defending against them

high relevance attack

Paper 2604.10611v1

2026-04-12

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

suspicious rate $\leq$ 0.36), robustness against both watermark and poisoning attacks (recall $\leq$ 0.57), and a substantial drop in model performance upon watermark removal (Pass@1 drops by 28.6%), underscoring

medium relevance benchmark

Paper 2603.20615v1

2026-03-21

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

perspective. We systematize three major sources of mismatch between research and practice: unrealistic poisoning threat models, the omission of hybrid heterogeneity, and incomplete metrics that overemphasize peak attack success while

medium relevance benchmark

Paper 2511.16709v1

2025-11-20

AutoBackdoor: Automating Backdoor Attacks via LLM Agents

model fine-tuning via an autonomous agent-driven pipeline. Unlike prior approaches, AutoBackdoor uses a powerful language model agent to generate semantically coherent, context-aware trigger phrases, enabling scalable poisoning

high relevance attack

Paper 2603.03371v1

2026-03-02

Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

model generates benign textual responses immediately after destructive actions. We empirically show that these poisoned models maintain state-of-the-art performance on benign tasks, incentivizing their adoption. Our findings

medium relevance tool

Paper 2510.09647v1

2025-10-05

Rounding-Guided Backdoor Injection in Deep Learning Model Quantization

quantization to embed malicious behaviors. Unlike conventional backdoor attacks relying on training data poisoning or model training manipulation, QuRA solely works using the quantization operations. In particular, QuRA first employs

high relevance attack

Paper 2602.03085v1

2026-02-03

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal

low relevance attack

Paper 2512.04785v1

2025-12-04

ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional

medium relevance tool

Paper 2602.13427v1

2026-02-13

Backdooring Bias in Large Language Models

model, with an adversary targeting the model builder's LLM. However, in the bias manipulation setting, the model builder themselves could be the adversary, warranting a white-box threat model

medium relevance benchmark

Paper 2603.03398v1

2026-03-03

Zero-Knowledge Federated Learning with Lattice-Based Hybrid Encryption for Quantum-Resilient Medical AI

models across hospitals without centralizing patient data. However, the exchange of model updates exposes critical vulnerabilities: gradient inversion attacks can reconstruct patient information, Byzantine clients can poison the global model

medium relevance attack

Paper 2510.19303v1

2025-10-22

Collaborative penetration testing suite for emerging generative AI algorithms

Problem Space: AI Vulnerabilities and Quantum Threats Generative AI vulnerabilities: model inversion, data poisoning, adversarial inputs. Quantum threats Shor Algorithm breaking RSA ECC encryption. Challenge Secure generative AI models against

medium relevance attack

Paper 2604.24975v1

2026-04-27

Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX

baseline to isolate adversarial effects. Our results show a clear separation between threat models. Static poisoning has minimal impact on lookup performance in ALEX under bulk-loaded settings, while dynamic

high relevance attack

Paper 2602.04653v3

2026-02-04

Inference-Time Backdoors via Hidden Instructions in LLM Chat Templates

model processing. We show that an adversary who distributes a model with a maliciously modified template can implant an inference-time backdoor without modifying model weights, poisoning training data

medium relevance attack

Paper 2511.11020v1

2025-11-14

Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis

analyses needed for detection. Supply chain weaknesses allow a single compromised vendor to poison models across 50 to 200 institutions. The Medical Scribe Sybil scenario shows how coordinated fake patient

medium relevance attack

CVE MEDIUM GHSA-6556-fwc2-fg2p

2025-12-30

Picklescan is vulnerable to RCE through missing detection when calling

picklescan View details

CVE HIGH GHSA-rrxm-2pvv-m66x

2025-12-30

Picklescan is vulnerable to RCE via missing detection when calling

picklescan View details

Paper 2604.08395v1

2026-04-09

Phantasia: Context-Adaptive Backdoors in Vision Language Models

backdoor attack that dynamically aligns its poisoned outputs with the semantics of each input. Instead of producing static poisoned patterns, Phantasia encourages models to generate contextually coherent yet malicious responses

medium relevance attack

Paper 2510.13462v1

2025-10-15

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers

these targeted experts. Unlike traditional backdoor attacks that rely on superficial data poisoning or model editing, BadSwitch primarily embeds malicious triggers into expert routing paths with strong task affinity, enabling

medium relevance benchmark

Previous Page 3 of 13 Next