Search: model poisoning | AI Threat Alert

Severity:

305 results in 155ms

Paper 2603.03919v1

2026-03-04

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge, but its reliance on potentially poisonable knowledge bases introduces new availability risks. Attackers can inject

high relevance attack

Paper 2511.07210v2

2025-11-10

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than

medium relevance benchmark

Paper 2512.24268v1

2025-12-30

RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation

exposed a critical vulnerability in RAG pipelines corpus poisoning where adversaries inject malicious documents into the retrieval corpus to manipulate model outputs. In this work, we propose two complementary retrieval

medium relevance attack

Paper 2604.12201v1

2026-04-14

AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning

generation (RAG) enhances large language model (LLM) reasoning by retrieving external documents, but also opens up new attack surfaces. We study knowledge-base poisoning attacks in RAG, where an attacker

medium relevance attack

Paper 2510.09210v1

2025-10-10

Provable Watermarking for Data Poisoning Attacks

poisoning-concurrent watermarking, the watermarked poisoning dataset provably ensures both watermarking detectability and poisoning utility, certifying the practicality of watermarking under data poisoning attacks. We validate our theoretical findings through

high relevance attack

Paper 2509.22060v2

2025-09-26

Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks

Optimization. Further, the novelty of the paper includes how poisoning attack can degrade the performances of state-of-the-art models leading to misinterpretation of audio signals. Through experimentation

high relevance attack

Paper 2602.06409v1

2026-02-06

VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems

language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal fusion. We find that while cross-modal consensus often mitigates conventional poisoning that

medium relevance tool

Paper 2603.20339v1

2026-03-20

Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed Graphs

platforms, an attacker may be able to quietly poison a small part of the training data and later make the model produce wrong predictions on demand. This paper studies that

medium relevance attack

Paper 2603.07191v2

2026-03-07

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and uncontrolled tool invocation -- that existing guardrails fail to address systematically

medium relevance benchmark

Paper 2510.03705v1

2025-10-04

Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods

prompt injection attack purposes. Specifically, the attackers poison the supervised fine-tuning samples and insert the backdoor into the model. Once the trigger is activated, the backdoored model executes

high relevance attack

Paper 2601.00566v1

2026-01-02

Low Rank Comes with Low Security: Gradient Assembly Poisoning Attacks against Distributed LoRA-based LLM Systems

separately, while only their product $AB$ determines the model update, yet this composite is never directly verified. We propose Gradient Assembly Poisoning (GAP), a novel attack that exploits this blind

high relevance tool

Paper 2510.00554v1

2025-10-01

Sentry: Authenticating Machine Learning Artifacts on the Fly

reliance on external datasets and pre-trained models exposes the system to supply chain attacks where an artifact can be poisoned before it is delivered to the end-user. Such

medium relevance benchmark

Paper 2603.00516v1

2026-02-28

ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data

Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings

medium relevance benchmark

Paper 2510.06823v2

2025-10-08

Exposing Citation Vulnerabilities in Generative Engines

generation that cites web pages using large language models. Because anyone can publish information on the web, GEs are vulnerable to poisoning attacks. Existing studies of citation evaluation focus

medium relevance benchmark

Paper 2510.04347v2

2025-10-05

Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models

behavior of backdoored pre-trained encoder-based language models, focusing on the consistent shift in attention and gradient attribution when processing poisoned inputs; where the trigger token dominates both attention

medium relevance defense

CVE MEDIUM CVE-2026-45397

2026-05-14

Open WebUI Vulnerable to Unauthenticated RAG Configuration Disclosure

CVSS 5.3 open-webui View details

Paper 2603.18103v1

2026-03-18

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

serious threat: an adversary who poisons a small fraction of training data can implant a hidden trigger that controls the model's output while preserving normal behavior on clean inputs

high relevance attack

Paper 2605.10600v1

2026-05-11

Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

returned to users. The second is a poison-based setting, where an attacker distributes a compromised text-to-image diffusion model whose output contains hidden content. We evaluate both attacks

high relevance attack

Paper 2604.23640v1

2026-04-26

Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems

enabling the training of an effective surrogate model that mimics the behaviors of the victim model. Leveraging the distilled prompt and surrogate model, we devise a promotion attack that adversarially

high relevance tool

CVE HIGH CVE-2026-45398

2026-05-14

Open WebUI Vulnerable to IDOR: Retrieval API Bypasses Knowledge Base

CVSS 7.5 open-webui View details

Previous Page 8 of 16 Next