Paper 2605.26754v1

Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control

Confundo-style poisoning where adversarially optimized documents manipulate generated outputs. Existing defenses assume that detecting poisoned evidence prevents harm. We show this assumption is incorrect: models exhibit a monitoring-control

medium relevance attack
Paper 2602.04899v1

Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning

data-level defences are insufficient for stopping sophisticated data poisoning attacks. We suggest that future work should focus on model audits and white-box security methods

medium relevance attack
Paper 2604.07536v1

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing

medium relevance tool
Paper 2602.02629v1

Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials

patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent

medium relevance benchmark
Paper 2602.19547v1

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

four major types of adversarial attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-based Backdoor. We evaluate six foundation models across two representative code interpreter agents (OpenInterpreter and OpenCodeInterpreter), incorporating

medium relevance benchmark
Paper 2602.07200v1

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

converts input data into spikes following the Leaky Integrate-and-Fire (LIF) neuron model. This model includes several important hyperparameters, such as the membrane potential threshold and membrane time constant

high relevance attack
Paper 2605.19262v1

Backdooring Masked Diffusion Language Models

training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs rely on discrete state corruption

medium relevance benchmark
Paper 2603.25164v1

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is often hindered by issues such as outdated knowledge and the tendency

high relevance attack
Paper 2602.01942v1

Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework

software components. Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail

medium relevance tool
Paper 2510.09710v2

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often

medium relevance benchmark
Paper 2511.08944v1

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into a poisoned class. Existing defenses often attempt

medium relevance benchmark
Paper 2602.15195v2

Weight space Detection of Backdoors in LoRA Adapters

trigger for backdoor behavior is unknown. We detect poisoned adapters by analyzing their weight matrices directly, without running the model -- making our method data-agnostic. Our method extracts simple statistics

medium relevance defense
Paper 2606.10846v1

Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models

natural backdoors, in normally trained deep learning models. Despite posing threats as serious as those introduced through data poisoning, security implications of natural backdoor vulnerabilities in CodeLMs remain poorly understood

high relevance benchmark
Paper 2604.27426v1

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

viewed as a privacy boundary, we reveal that compromised model code is sufficient to steal them. Current passive pretrained-weight poisoning attacks, while effective for natural language, fundamentally fail

high relevance attack
Paper 2602.10780v1

Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks

input. Existing mitigations filter training data, modify the model, or perform expensive input modifications on samples. If a vulnerable model has already been deployed, however, those strategies are either ineffective

medium relevance defense
Paper 2511.10714v1

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

process to embed the behavior by generating highly naturalistic poisoned data. Our experiments on multiple state-of-the-art models and reasoning tasks show that BadThink consistently increases reasoning trace

high relevance attack
Paper 2601.06466v1

SecureDyn-FL: A Robust Privacy-Preserving Federated Learning Framework for Intrusion Detection in IoT Networks

dynamic temporal gradient auditing mechanism that leverages Gaussian mixture models (GMMs) and Mahalanobis distance (MD) to detect stealthy and adaptive poisoning attacks, (ii) an optimized privacy-preserving aggregation scheme based

medium relevance defense
Paper 2510.22944v1

Is Your Prompt Poisoning Code? Defect Induction Rates and Security Mitigation Strategies

Large language models (LLMs) have become indispensable for automated code

medium relevance attack
Paper 2606.12703v1

SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

agent's responses for future users, without touching model weights or code. We call this Multi-Session Memory Poisoning (MSMP) and show that no existing defence certifies against it; static

medium relevance tool
Paper 2603.16405v1

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Semantic segmentation models are widely deployed in safety-critical applications

high relevance attack
Previous Page 7 of 16 Next