Search: model poisoning | AI Threat Intelligence

195 results in 223ms

Paper 2512.00804v1

2025-11-30

Bias Injection Attacks on RAG Databases and Sanitization Defenses

defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning attacks primarily inject false or toxic content, which fact-checking or linguistic analysis easily detects

high relevance attack

Paper 2603.05073v1

2026-03-05

Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

with ordinal nature. In this paper, we study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions. We initially attempt

medium relevance benchmark

Paper 2603.12414v1

2026-03-12

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius

high relevance attack

Paper 2602.07652v1

2026-02-07

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence

medium relevance benchmark

Paper 2512.14158v1

2025-12-16

CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World

Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are inherently limited in both capability

high relevance attack

Paper 2601.07072v1

2026-01-11

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems

user query on OpenAI's embedding models), and achieves near-100% retrieval across 11 benchmarks and 8 embedding models (including both open-source models and proprietary services). Based on this

high relevance tool

Paper 2603.07379v1

2026-03-07

SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions

Retrieval-Augmented Generation (RAG) systems are increasingly evolving into agentic architectures where large language models autonomously coordinate multi-step reasoning, dynamic memory management, and iterative retrieval strategies. Despite rapid industrial

low relevance survey

Paper 2601.05467v3

2026-01-09

STELP: Secure Transpilation and Execution of LLM-Generated Programs

Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs place them at the center

medium relevance survey

Paper 2511.18921v1

2025-11-24

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

hijack. Each category captures a distinct pathway through which an adversary can manipulate a model's behavior. We evaluate these threats using 12 representative attack methods spanning text, image

high relevance benchmark

Paper 2602.22134v2

2026-02-25

Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions

SemCom via AI defense. We analyze AI-centric threat models by consolidating existing studies and organizing attack surfaces across model-level, channel-realizable, knowledge-based, and networked inference vectors. Building

medium relevance defense

Paper 2602.00750v1

2026-01-31

Bypassing Prompt Injection Detectors through Evasive Injections

Large language models (LLMs) are increasingly used in interactive and retrieval-augmented systems, but they remain vulnerable to task drift; deviations from a user's intended instruction due to injected

high relevance attack

Paper 2512.21681v1

2025-12-25

Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation

Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development, yet its security implications remain dangerously underexplored. This paper conducts the first systematic exploration

medium relevance attack

Paper 2512.10415v2

2025-12-11

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation

Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But their reliability can be compromised by students who may employ adversarial prompting

high relevance benchmark

Paper 2512.08290v2

2025-12-09

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Model Context Protocol (MCP) has emerged as the de facto standard for connecting Large Language Models (LLMs) to external data and tools, effectively functioning as the "USB-C for Agentic

medium relevance survey

Paper 2510.01157v2

2025-10-01

Backdoor Attacks Against Speech Language Models

resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness

high relevance attack

Paper 2512.14741v1

2025-12-12

Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs

Backdoor attacks embed malicious behaviors into Large Language Models (LLMs), enabling adversaries to trigger harmful outputs or bypass safety controls. However, the persistence of the implanted backdoors under user-driven

high relevance attack

Paper 2602.20593v1

2026-02-24

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

parties with distinct features and one active party with labels to collaboratively train a model. Although it is known for the privacy-preserving capabilities, VFL still faces significant privacy

high relevance attack

Paper 2602.18082v1

2026-02-20

AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly

detection mechanisms and harden manual analysis. Adversaries typically rely on obfuscation, anti-repacking, steganography, poisoning, and evasion techniques to AI-based tools, and in-memory execution to conceal malicious functionality

medium relevance attack

Paper 2603.11619v1

2026-03-12

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks. However, their tightly coupled instant-messaging interaction paradigm and high-privilege execution

medium relevance defense

Paper 2601.05260v1

2025-10-27

Quantifying Document Impact in RAG-LLMs

Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing outdated information. However, this introduces challenges such as factual inconsistencies, source

medium relevance benchmark

Previous Page 9 of 10 Next