vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM
core functions of vLLM Hook, in version 0, we demonstrate 3 use cases including prompt injection detection, enhanced retrieval-augmented retrieval (RAG), and activation steering. Finally, we welcome the community
Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework
Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail to capture risks
SMCP: Secure Model Context Protocol
security and privacy challenges. These include risks such as unauthorized access, tool poisoning, prompt injection, privilege escalation, and supply chain attacks, any of which can impact different parts
CAI find_file Agent Tool has Command Injection Vulnerability Through
Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP
raters and tends to fall within the human scoring range. We also evaluate bias, prompt injection risks, and LLMs as essay writers. These findings demonstrate that a principled, rubric-driven
From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness
shifts appear across task types and model architectures, indicating that persona conditioning and simple prompt injections can distort an agent's decision-making reliability. Our findings reveal an overlooked vulnerability
MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction
perform complex tasks. This autonomy introduces serious security risks: malicious instructions or visual prompt injections can trigger unsafe reasoning and cause harmful system-level actions. Existing defenses, such as detection
Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents
practices. Finally, we highlight open challenges, such as hallucination in action, infinite loops, and prompt injection, and outline future research directions toward more robust and reliable autonomous systems
AgenTRIM: Tool Risk Mitigation for Agentic AI
While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
skills contain at least one vulnerability, spanning 14 distinct patterns across four categories: prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration (13.3%) and privilege escalation
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
percent on average and improves benign task completion by approximately 10 percent under prompt injection attacks
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents
agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly
When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent
broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first
Defenses Against Prompt Attacks Learn Surface Heuristics
test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental
From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)
smooth transitions and audio/visual alignment; (ii) a personalization mechanism based on role definition and prompt injection for tailored outputs (marketing, training, regulatory); (iii) a cost efficient e2e pipeline strategy balancing
Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries
threat model tailored to agent-driven transaction pipelines that captures risks ranging from prompt injection and policy misuse to key compromise, adversarial execution dynamics, and multi-agent collusion
Hidden State Poisoning Attacks against Mamba-based Language Models
also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. Finally, our interpretability study reveals patterns in Mamba's hidden
Hidden State Poisoning Attacks against Mamba-based Language Models
also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. We further show that the theoretical and empirical findings extend
MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools
agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners