Search: prompt injection | AI Threat Alert

Severity:

494 results in 189ms

Paper 2603.16734v1

2026-03-17

Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

benign counterparts) under controlled prompt conditions that vary user-context personalization (no bio, bio-only, bio+mental health disclosure) and include a lightweight jailbreak injection. Our results reveal that harmful

medium relevance benchmark

Paper 2510.04503v2

2025-10-06

P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs

algorithm. P2P injects benign triggers with safe alternative labels into a subset of training samples and fine-tunes the model on this re-poisoned dataset by leveraging prompt-based learning

medium relevance defense

Paper 2606.22686v1

2026-06-21

The Geometry of Refusal: Linear Instability in Safety-Aligned LLMs

prompts. Unlike representation engineering methods that intervene on internal activations, CLS operates directly on the output distribution, serving as a diagnostic probe for alignment fragility. When coupled with prefix injection

medium relevance defense

Paper 2510.17098v2

2025-10-20

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection

medium relevance attack

Paper 2602.01574v1

2026-02-02

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models

reference pool by sampling a frozen text-to-image model conditioned on the target prompt, and then carefully select the Top-K most semantically relevant anchors under the surrogate

high relevance attack

Paper 2511.08905v3

2025-11-12

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective

medium relevance attack

Paper 2510.11851v2

2025-10-13

Deep Research Brings Deeper Harm

agents. To address this gap, we propose two novel jailbreak strategies: Plan Injection, which injects malicious sub-goals into the agent's plan; and Intent Hijack, which reframes harmful queries

medium relevance benchmark

Paper 2603.18740v1

2026-03-19

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

across four state-of-the-art models under five framing conditions for the review prompt. Framing a change as bug-free reduces vulnerability detection rates by 16-93%, with strongly

high relevance survey

Paper 2509.20324v1

2025-09-24

RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

demonstrated that LLMs can leak sensitive information through training data memorization or adversarial prompts, and RAG systems inherit many of these vulnerabilities. At the same time, reliance

high relevance attack

CVE HIGH CVE-2026-44246

2026-05-12

nnU-Net is a semantic segmentation framework that automatically adapts

CVSS 7.2 claude-code View details

Paper 2511.09222v4

2025-11-12

Toward Honest Language Models for Deductive Reasoning

cases by randomly perturbing an edge in half of the instances. We find that prompting and existing training methods, including GRPO with or without supervised fine-tuning initialization, struggle

low relevance benchmark

Paper 2602.19450v1

2026-02-23

Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

system, yet real deployments remain vulnerable to microarchitectural leakage, side-channel attacks, and fault injection. In parallel, security teams increasingly rely on Large Language Model (LLM) assistants as security advisors

high relevance survey

Paper 2601.12983v1

2026-01-19

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Multimodal large language models (MLLMs) are increasingly used to automate

high relevance attack

Paper 2606.24589v1

2026-06-23

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

real. We present AdversaBench, an end-to-end red-teaming pipeline that mutates seed prompts with five structured operators, queries a target model, and confirms failures through a three-judge

high relevance benchmark

Paper 2601.05504v2

2026-01-09

Memory Poisoning Attack and Defense on Memory Based LLM-Agents

memory and influence future responses. Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate and 70 % attack success rate under idealized conditions. However

high relevance attack

Paper 2605.02812v1

2026-05-04

Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

graph analyzer, traces data flow from file I/O to LLM context injection points and ranks carriers by context injection position without manual analysis. SRPO, our summary-resilient payload optimizer, generates

medium relevance tool

CVE HIGH CVE-2025-64496

2025-11-07

Open WebUI Affected by an External Model Server (Direct Connections

CVSS 7.3 open-webui View details

Paper 2606.18310v1

2026-06-16

Conflict-Aware Retriever Editing for Knowledge Injection Attacks on LLM-Based RAG Systems

mislead downstream generation, posing a serious security threat for AI applications. Existing RAG injection attacks mainly rely on manipulating external knowledge bases, such as crafting malicious corpus. However, the synthetic

high relevance tool

Paper 2602.17837v1

2026-02-19

TFL: Targeted Bit-Flip Attack on Large Language Model

safety and security critical applications, raising concerns about their robustness to model parameter fault injection attacks. Recent studies have shown that bit-flip attacks (BFAs), which exploit computer main memory

high relevance attack

Paper 2606.24402v1

2026-06-23

Poisoned Playbooks: Demystifying Knowledge Poisoning Effects on AI Security Agents

challenges and AI agents. First, we demonstrate how a crafted single poisoned write-up injected into public-style security knowledge sources which we denote as Poisoned Playbooks, alters the behavior

medium relevance attack

Previous Page 24 of 25 Next