Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw
OpenClaw ecosystem. We systematically investigate its current threat landscape, highlighting critical vulnerabilities such as prompt injection-driven Remote Code Execution (RCE), sequential tool attack chains, context amnesia, and supply chain
Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
GFMs. SAGE functions through an interpretable and automated risk auditing loop. It injects soft prompt perturbations, monitors model behavior across training checkpoints, computes risk metrics such as AUROC and AUPR
Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems
detection under stealthy strategies, and (iii) resilience of verifiability mechanisms to adversarial prompt and persona injection. Our approach shifts the evaluation focus from how likely misalignment is to how quickly
AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition
propose AULLM++, a reasoning-oriented framework leveraging Large Language Models (LLMs), which injects visual features into textual prompts as actionable semantic premises to guide inference. It formulates AU prediction into
BadTemplate: A Training-Free Backdoor Attack via Chat Template Against Large Language Models
chat templates allows an attacker who controls the template to inject arbitrary strings into the system prompt without the user's notice. Building on this, we propose a training-free
Text Prompter – Unlimited chatgpt text prompts for openai tasks plugin for WordPress is vulnerable to Stored Cross-Site Scripting via the plugin's 'text_prompter' shortcode in all versions
Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search
injection. LATS reformulates jailbreaking as a breadth-first tree search over multi-turn dialogues, where each node incrementally injects missing content words from the attack goal into benign prompts. Evaluations
files, which leads to a server side template injection vulnerability within langchaingo, allowing an attacker to insert a statement into a prompt to read the "etc/passwd" file
Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio
second leverages audio-modality exploits (Read, Spell, Phoneme) that inject harmful content through auxiliary audio channels while maintaining benign textual prompts. Through evaluation across five commercial LALMs-based TTS systems
Evaluating Adversarial Vulnerabilities in Modern Large Language Models
prompted to circumvent their own safety protocols, and 'cross-bypass', where one model generated adversarial prompts to exploit vulnerabilities in the other. Four attack methods were employed - direct injection, role
Large Language Models for Detecting Cyberattacks on Smart Grid Protective Relays
perfect fault detection accuracy. Additional evaluations demonstrate robustness to prompt formulation variations, resilience under combined time-synchronization and false-data injection attacks, and stable performance under realistic measurement noise levels
Exposing Citation Vulnerabilities in Generative Engines
perspectives of citation publishers and the content-injection barrier, defined as the difficulty for attackers to manipulate answers to user prompts by placing malicious content on the web. GEs integrate
Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections
that memory evolution can convert one-time indirect injection into persistent compromise, which suggests that defenses focused only on per-session prompt filtering are not sufficient for self-evolving agents
Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection
assistant message block rather than the user prompt, increasing ASR by 64% over GCG on Llama-3.1-8B in a prompt-agnostic setting. The results establish sockpuppetting
Automating Agent Hijacking via Structural Template Injection
ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success
TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code
enforce safety constraints, just as naive prompting for more secure code, our type-focused agentic pipeline substantially mitigates input validation and injection vulnerabilities. The results demonstrate the potential of structured
Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection
which attempts to override the system prompt, Reasoning Hijacking accepts the high-level goal but manipulates the model's decision-making logic by injecting spurious reasoning shortcut. Though extensive experiments
ShadowLogic: Backdoors in Any Whitebox LLM
injecting an uncensoring vector into its computational graph representation. We set a trigger phrase that, when added to the beginning of a prompt into the LLM, applies the uncensoring vector
Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure
benign counterparts) under controlled prompt conditions that vary user-context personalization (no bio, bio-only, bio+mental health disclosure) and include a lightweight jailbreak injection. Our results reveal that harmful