Search: model poisoning | AI Threat Alert

Severity:

307 results in 130ms

Paper 2510.14381v2

2025-10-16

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

systematic analysis of poisoning risks in LLM-based prompt optimization. Using HarmBench, we find systems are substantially more vulnerable to manipulated feedback than to query poisoning alone: feedback-based attacks

medium relevance attack

Paper 2509.21011v1

2025-09-25

Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools

large language models (LLMs) has led to the wide application of LLM-based agents in various domains. To standardize interactions between LLM-based agents and their environments, model context protocol

high relevance tool

Paper 2512.10637v2

2025-12-11

Adaptive Intrusion Detection System Leveraging Dynamic Neural Models with Adversarial Learning for 5G/6G Networks

network security by providing robust, real-time threat detection and response capabilities. Unlike conventional models, which require costly retraining to update knowledge, the proposed framework integrates incremental learning algorithms, reducing

medium relevance attack

Paper 2604.23775v1

2026-04-26

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

models, including architectures, training paradigms, and inference mechanisms. We then examine the literature through four lenses: Attacks, Defenses, Evaluation, and Deployment. We survey training-time threats such as data poisoning

medium relevance benchmark

Paper 2511.14989v2

2025-11-19

Critical Evaluation of Quantum Machine Learning for Adversarial Robustness

three threat models-black-box, gray-box, and white-box. We implement representative attacks in each category, including label-flipping for black-box, QUID encoder-level data poisoning for gray

medium relevance benchmark

Paper 2511.12936v1

2025-11-17

Privacy-Preserving Federated Learning from Partial Decryption Verifiable Threshold Multi-Client Functional Encryption

cooperate to train the model without directly exchanging their own private data, but the gradient leakage problem still threatens the privacy security and model integrity. Although the existing scheme uses

medium relevance benchmark

Paper 2605.18988v1

2026-05-18

Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks

expansion of Multimodal Large Language Models (MLLMs) and their integration into autonomous agentic workflows has introduced a non-stationary attack surface. Empirical observations indicate that adversaries employ progressive, cross-modal

high relevance attack

Paper 2604.10681v1

2026-04-12

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

Large Language Models (LLMs), despite their impressive capabilities across domains, have been shown to be vulnerable to backdoor attacks. Prior backdoor strategies predominantly operate at the token level, where

high relevance tool

Paper 2606.17815v1

2026-06-16

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a small attack

medium relevance benchmark

Paper 2605.27809v1

2026-05-27

Density-aware Sample-specific Attack

derive principled criteria characterizing optimal sample-specific trigger construction under a Bayes-optimal model of the victim's training. Our analysis reveals that both attack success and clean-accuracy preservation

high relevance attack

Paper 2512.09742v1

2025-12-10

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

contexts can dramatically shift behavior outside those contexts. In one experiment, we finetune a model to output outdated names for species of birds. This causes it to behave

medium relevance benchmark

Paper 2603.02849v1

2026-03-03

DSBA: Dynamic Stealthy Backdoor Attack with Collaborative Optimization in Self-Supervised Learning

generalization capabilities, and its potential for privacy preservation. However, recent research reveals that SSL models are also vulnerable to backdoor attacks. Existing backdoor attack methods in the SSL context commonly

high relevance attack

Paper 2605.21146v1

2026-05-20

Detecting Trojaned DNNs via Spectral Regression Analysis

approach that analyzes how a model's internal representations change during fine-tuning. Rather than attempting to reconstruct trigger conditions, MIST characterizes benign model evolution using pre-activation spectra

medium relevance benchmark

Paper 2602.19555v1

2026-02-23

Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

Agentic systems built on large language models (LLMs) extend beyond text generation to autonomously retrieve information and invoke tools. This runtime execution model shifts the attack surface from build-time

high relevance attack

Paper 2604.19083v1

2026-04-21

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is threatened by critical safety vulnerabilities. While prior works have demonstrated

medium relevance defense

Paper 2601.11207v1

2026-01-16

LoRA as Oracle

Existing defenses for backdoor detection and membership inference typically require access to clean reference models, extensive retraining, or strong assumptions about the attack mechanism. In this work, we introduce

medium relevance attack

Paper 2603.07835v1

2026-03-08

DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation

Knowledge distillation from proprietary LLM APIs poses a growing threat to model providers, yet defenses against this attack remain fragmented and unevaluated. We present DistillGuard, a framework for systematically evaluating

medium relevance defense

Paper 2511.06212v1

2025-11-09

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

Artificial Intelligence has become a valuable solution in securing IoT networks, with Large Language Models (LLMs) enabling automated attack behavior analysis and mitigation suggestion in Network Intrusion Detection Systems (NIDS

high relevance tool

Paper 2512.19297v1

2025-12-22

Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models

Backdoor Attack (CBA), a novel backdoor attack framework specifically designed for open-weight LoRA models. CBA operates without access to original training data and achieves high stealth through

high relevance attack

Paper 2603.03108v1

2026-03-03

RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy

achieving robustness under adversarial behavior remains challenging. Modern systems increasingly adopt the shuffle model of differential privacy (Shuffle-DP) to locally perturb client updates and globally anonymize them via shuffling

medium relevance benchmark

Previous Page 12 of 16 Next