Paper 2602.05401v1

BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models

chat templates allows an attacker who controls the template to inject arbitrary strings into the system prompt without the user's notice. Building on this, we propose a training-free

high relevance attack
Paper 2604.21700v1

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

attack in a realistic threat model and systematically evaluate BadStyle under both prompt-induced and PEFT-based injection strategies. Extensive experiments across seven victim LLMs, including LLaMA, Phi, DeepSeek

high relevance attack
Paper 2604.21829v1

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

prior prompt-stealing methods and build an automated stealing prompt generation agent. This agent starts from model-generated seed prompts, expands them through scenario rationalization and structure injection, and enforces

high relevance attack
Paper 2601.02670v1

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

injection. LATS reformulates jailbreaking as a breadth-first tree search over multi-turn dialogues, where each node incrementally injects missing content words from the attack goal into benign prompts. Evaluations

high relevance attack

Flowise: Parameter Override Bypass Remote Command Execution

CVSS 7.7 flowise-components View details
Paper 2605.12746v1

CoT-Guard: Small Models for Strong Monitoring

attacks, where the adversary is a third-party LLM router injecting hidden objectives into code-generation requests through either prompt manipulation or code manipulation attacks. To push beyond objectives that

medium relevance attack
Paper 2605.10600v1

Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

rendered onto semantically related objects, even when the user prompt does not explicitly mention it. This form of hidden payload injection makes the attack stealthy. We study two realistic attack

high relevance attack
Paper 2511.10913v1

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

second leverages audio-modality exploits (Read, Spell, Phoneme) that inject harmful content through auxiliary audio channels while maintaining benign textual prompts. Through evaluation across five commercial LALMs-based TTS systems

medium relevance benchmark
Paper 2511.17666v1

Evaluating Adversarial Vulnerabilities in Modern Large Language Models

prompted to circumvent their own safety protocols, and 'cross-bypass', where one model generated adversarial prompts to exploit vulnerabilities in the other. Four attack methods were employed - direct injection, role

medium relevance attack
Paper 2606.10742v1

MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

induction that leverages adversarial perturbations and stealthy OCR injection to override the original user objective. Unlike prior attacks that operate on prompts or text-only memory, our approach enables persistent

medium relevance attack

LiteLLM: Server-Side Template Injection in /prompts/test endpoint

Paper 2601.04443v2

Large Language Models for Detecting Cyberattacks on Smart Grid Protective Relays

perfect fault detection accuracy. Additional evaluations demonstrate robustness to prompt formulation variations, resilience under combined time-synchronization and false-data injection attacks, and stable performance under realistic measurement noise levels

high relevance attack
Paper 2510.06823v2

Exposing Citation Vulnerabilities in Generative Engines

perspectives of citation publishers and the content-injection barrier, defined as the difficulty for attackers to manipulate answers to user prompts by placing malicious content on the web. GEs integrate

medium relevance benchmark
Paper 2602.15654v2

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents

high relevance attack
Paper 2601.13359v1

Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

assistant message block rather than the user prompt, increasing ASR by 64% over GCG on Llama-3.1-8B in a prompt-agnostic setting. The results establish sockpuppetting

high relevance attack
Paper 2602.16958v1

Automating Agent Hijacking via Structural Template Injection

ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success

high relevance attack
Paper 2604.20994v1

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

powered system by invoking external functions. Injection and jailbreaking attacks have been extensively explored to showcase the vulnerabilities of LLMs to user prompt manipulation. The expanded capabilities of agentic models

high relevance attack
Paper 2510.11151v1

TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code

enforce safety constraints, just as naive prompting for more secure code, our type-focused agentic pipeline substantially mitigates input validation and injection vulnerabilities. The results demonstrate the potential of structured

medium relevance tool
Paper 2601.10294v2

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

which attempts to override the system prompt, Reasoning Hijacking accepts the high-level goal but manipulates the model's decision-making logic by injecting spurious reasoning shortcut. Though extensive experiments

high relevance attack
Paper 2511.00664v1

ShadowLogic: Backdoors in Any Whitebox LLM

injecting an uncensoring vector into its computational graph representation. We set a trigger phrase that, when added to the beginning of a prompt into the LLM, applies the uncensoring vector

medium relevance attack
Previous Page 23 of 25 Next