Paper 2605.19262v1

Backdooring Masked Diffusion Language Models

training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs rely on discrete state corruption

medium relevance benchmark
Paper 2603.25164v1

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is often hindered by issues such as outdated knowledge and the tendency

high relevance attack
Paper 2602.01942v1

Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework

software components. Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail

medium relevance tool
Paper 2510.09710v2

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often

medium relevance benchmark
Paper 2511.08944v1

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into a poisoned class. Existing defenses often attempt

medium relevance benchmark
Paper 2602.15195v2

Weight space Detection of Backdoors in LoRA Adapters

trigger for backdoor behavior is unknown. We detect poisoned adapters by analyzing their weight matrices directly, without running the model -- making our method data-agnostic. Our method extracts simple statistics

medium relevance defense
Paper 2606.10846v1

Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models

natural backdoors, in normally trained deep learning models. Despite posing threats as serious as those introduced through data poisoning, security implications of natural backdoor vulnerabilities in CodeLMs remain poorly understood

high relevance benchmark
Paper 2604.27426v1

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

viewed as a privacy boundary, we reveal that compromised model code is sufficient to steal them. Current passive pretrained-weight poisoning attacks, while effective for natural language, fundamentally fail

high relevance attack
Paper 2602.10780v1

Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks

input. Existing mitigations filter training data, modify the model, or perform expensive input modifications on samples. If a vulnerable model has already been deployed, however, those strategies are either ineffective

medium relevance defense
Paper 2511.10714v1

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

process to embed the behavior by generating highly naturalistic poisoned data. Our experiments on multiple state-of-the-art models and reasoning tasks show that BadThink consistently increases reasoning trace

high relevance attack
Paper 2601.06466v1

SecureDyn-FL: A Robust Privacy-Preserving Federated Learning Framework for Intrusion Detection in IoT Networks

dynamic temporal gradient auditing mechanism that leverages Gaussian mixture models (GMMs) and Mahalanobis distance (MD) to detect stealthy and adaptive poisoning attacks, (ii) an optimized privacy-preserving aggregation scheme based

medium relevance defense
Paper 2510.22944v1

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Large language models (LLMs) have become indispensable for automated code

medium relevance attack
Paper 2606.12703v1

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

agent's responses for future users, without touching model weights or code. We call this Multi-Session Memory Poisoning (MSMP) and show that no existing defence certifies against it; static

medium relevance tool
Paper 2603.16405v1

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Semantic segmentation models are widely deployed in safety-critical applications

high relevance attack
Paper 2603.03919v1

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge, but its reliance on potentially poisonable knowledge bases introduces new availability risks. Attackers can inject

high relevance attack
Paper 2511.07210v2

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than

medium relevance benchmark
Paper 2512.24268v1

RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation

exposed a critical vulnerability in RAG pipelines corpus poisoning where adversaries inject malicious documents into the retrieval corpus to manipulate model outputs. In this work, we propose two complementary retrieval

medium relevance attack
Paper 2604.12201v1

AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning

generation (RAG) enhances large language model (LLM) reasoning by retrieving external documents, but also opens up new attack surfaces. We study knowledge-base poisoning attacks in RAG, where an attacker

medium relevance attack
Paper 2510.09210v1

Provable Watermarking for Data Poisoning Attacks

poisoning-concurrent watermarking, the watermarked poisoning dataset provably ensures both watermarking detectability and poisoning utility, certifying the practicality of watermarking under data poisoning attacks. We validate our theoretical findings through

high relevance attack
Paper 2509.22060v2

Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks

Optimization. Further, the novelty of the paper includes how poisoning attack can degrade the performances of state-of-the-art models leading to misinterpretation of audio signals. Through experimentation

high relevance attack
Previous Page 7 of 15 Next