Paper 2602.00750v1

Bypassing Prompt Injection Detectors through Evasive Injections

Large language models (LLMs) are increasingly used in interactive and retrieval-augmented systems, but they remain vulnerable to task drift; deviations from a user's intended instruction due to injected

high relevance attack
Paper 2512.21681v1

Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation

Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development, yet its security implications remain dangerously underexplored. This paper conducts the first systematic exploration

medium relevance attack
Paper 2512.10415v2

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation

Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But their reliability can be compromised by students who may employ adversarial prompting

high relevance benchmark
Paper 2605.27674v1

Backdoor Attacks on Fault Detection and Localization in Cyber-Physical Systems

intelligent models are vulnerable to adversarial machine learning attacks, particularly backdoor attacks. In a backdoor attack, an adversary injects malicious patterns into the training data so that the model behaves

high relevance attack
Paper 2512.08290v2

Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

Model Context Protocol (MCP) has emerged as the de facto standard for connecting Large Language Models (LLMs) to external data and tools, effectively functioning as the "USB-C for Agentic

medium relevance survey
Paper 2510.01157v2

Backdoor Attacks Against Speech Language Models

resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness

high relevance attack
Paper 2605.11442v1

Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection

safety filters, and highly configurable, allowing for surgical targeting of specific environments or model providers. To evaluate the real-world impact, we conduct extensive experiments across three representative claw-style

high relevance attack
Paper 2605.19253v1

Detecting and Mitigating Backdoor Attacks in OTA-FL Systems: A Two-Stage Robust Aggregation Scheme

server (PS) cannot access individual local updates, making it difficult to identify and exclude poisoned gradients. The challenge is further exacerbated under non-independent and identically distributed (Non-IID) training

high relevance attack
Paper 2606.10742v1

MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

systematically study multimodal memory poisoning, an overlooked yet practical attack surface in web-agent systems. We propose MemVenom, a unified black-box attack framework that poisons graph-structured external memory

medium relevance attack
Paper 2603.29328v1

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible

high relevance attack
Paper 2512.14741v1

Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs

Backdoor attacks embed malicious behaviors into Large Language Models (LLMs), enabling adversaries to trigger harmful outputs or bypass safety controls. However, the persistence of the implanted backdoors under user-driven

high relevance attack
Paper 2602.20593v1

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

parties with distinct features and one active party with labels to collaboratively train a model. Although it is known for the privacy-preserving capabilities, VFL still faces significant privacy

high relevance attack
Paper 2602.18082v1

AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly

detection mechanisms and harden manual analysis. Adversaries typically rely on obfuscation, anti-repacking, steganography, poisoning, and evasion techniques to AI-based tools, and in-memory execution to conceal malicious functionality

medium relevance attack
Paper 2604.08407v1

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define

high relevance attack
Paper 2603.11619v1

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks. However, their tightly coupled instant-messaging interaction paradigm and high-privilege execution

medium relevance defense
Paper 2601.05260v1

Quantifying Document Impact in RAG-LLMs

Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing outdated information. However, this introduces challenges such as factual inconsistencies, source

medium relevance benchmark
Paper 2601.17548v1

Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

development workflows. These systems leverage Large Language Models (LLMs) integrated with external tools, file systems, and shell access through protocols like the Model Context Protocol (MCP). However, this expanded capability

high relevance attack
Paper 2606.24322v1

Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees

LLM agents increasingly rely on persistent long-term memory, which

medium relevance attack
Paper 2601.14323v1

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

Vision-Language-Action (VLA) models are increasingly deployed in safety-critical robotic applications, yet their security vulnerabilities remain underexplored. We identify a fundamental security flaw in modern VLA systems

high relevance attack
Paper 2510.11837v1

Countermind: A Multi-Layered Security Architecture for Large Language Models

validate and transform all inputs, and an internal governance mechanism intended to constrain the model's semantic processing pathways before an output is generated. The primary contributions of this work

medium relevance benchmark
Previous Page 14 of 15 Next