Search: model poisoning | AI Threat Alert

Severity:

301 results in 138ms

Paper 2604.07536v1

2026-04-08

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing

medium relevance tool

Paper 2602.02629v1

2026-02-02

Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials

patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent

medium relevance benchmark

Paper 2602.19547v1

2026-02-23

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

four major types of adversarial attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-based Backdoor. We evaluate six foundation models across two representative code interpreter agents (OpenInterpreter and OpenCodeInterpreter), incorporating

medium relevance benchmark

Paper 2602.07200v1

2026-02-06

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

converts input data into spikes following the Leaky Integrate-and-Fire (LIF) neuron model. This model includes several important hyperparameters, such as the membrane potential threshold and membrane time constant

high relevance attack

Paper 2605.19262v1

2026-05-19

Backdooring Masked Diffusion Language Models

training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs rely on discrete state corruption

medium relevance benchmark

Paper 2603.25164v1

2026-03-26

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is often hindered by issues such as outdated knowledge and the tendency

high relevance attack

Paper 2602.01942v1

2026-02-02

Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework

software components. Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail

medium relevance tool

Paper 2510.09710v2

2025-10-10

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often

medium relevance benchmark

Paper 2511.08944v1

2025-11-12

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into a poisoned class. Existing defenses often attempt

medium relevance benchmark

Paper 2602.15195v2

2026-02-16

Weight space Detection of Backdoors in LoRA Adapters

trigger for backdoor behavior is unknown. We detect poisoned adapters by analyzing their weight matrices directly, without running the model -- making our method data-agnostic. Our method extracts simple statistics

medium relevance defense

Paper 2606.10846v1

2026-06-09

Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models

natural backdoors, in normally trained deep learning models. Despite posing threats as serious as those introduced through data poisoning, security implications of natural backdoor vulnerabilities in CodeLMs remain poorly understood

high relevance benchmark

Paper 2604.27426v1

2026-04-30

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

viewed as a privacy boundary, we reveal that compromised model code is sufficient to steal them. Current passive pretrained-weight poisoning attacks, while effective for natural language, fundamentally fail

high relevance attack

Paper 2602.10780v1

2026-02-11

Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks

input. Existing mitigations filter training data, modify the model, or perform expensive input modifications on samples. If a vulnerable model has already been deployed, however, those strategies are either ineffective

medium relevance defense

Paper 2511.10714v1

2025-11-13

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

process to embed the behavior by generating highly naturalistic poisoned data. Our experiments on multiple state-of-the-art models and reasoning tasks show that BadThink consistently increases reasoning trace

high relevance attack

Paper 2601.06466v1

2026-01-10

SecureDyn-FL: A Robust Privacy-Preserving Federated Learning Framework for Intrusion Detection in IoT Networks

dynamic temporal gradient auditing mechanism that leverages Gaussian mixture models (GMMs) and Mahalanobis distance (MD) to detect stealthy and adaptive poisoning attacks, (ii) an optimized privacy-preserving aggregation scheme based

medium relevance defense

Paper 2510.22944v1

2025-10-27

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Large language models (LLMs) have become indispensable for automated code

medium relevance attack

Paper 2606.12703v1

2026-06-10

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

agent's responses for future users, without touching model weights or code. We call this Multi-Session Memory Poisoning (MSMP) and show that no existing defence certifies against it; static

medium relevance tool

Paper 2603.16405v1

2026-03-17

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Semantic segmentation models are widely deployed in safety-critical applications

high relevance attack

Paper 2603.03919v1

2026-03-04

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge, but its reliance on potentially poisonable knowledge bases introduces new availability risks. Attackers can inject

high relevance attack

Paper 2511.07210v2

2025-11-10

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than

medium relevance benchmark

Previous Page 7 of 16 Next