Paper 2602.01942v1

Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework

Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail to capture risks

medium relevance tool
Paper 2602.01129v1

SMCP: Secure Model Context Protocol

security and privacy challenges. These include risks such as unauthorized access, tool poisoning, prompt injection, privilege escalation, and supply chain attacks, any of which can impact different parts

medium relevance attack
Paper 2601.16314v1

Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP

raters and tends to fall within the human scoring range. We also evaluate bias, prompt injection risks, and LLMs as essay writers. These findings demonstrate that a principled, rubric-driven

medium relevance benchmark
Paper 2602.12285v1

From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness

shifts appear across task types and model architectures, indicating that persona conditioning and simple prompt injections can distort an agent's decision-making reliability. Our findings reveal an overlooked vulnerability

medium relevance benchmark
Paper 2601.12822v1

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

perform complex tasks. This autonomy introduces serious security risks: malicious instructions or visual prompt injections can trigger unsafe reasoning and cause harmful system-level actions. Existing defenses, such as detection

medium relevance benchmark
Paper 2601.12560v1

Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents

practices. Finally, we highlight open challenges, such as hallucination in action, infinite loops, and prompt injection, and outline future research directions toward more robust and reliable autonomous systems

medium relevance benchmark
Paper 2601.12449v1

AgenTRIM: Tool Risk Mitigation for Agentic AI

While such tools extend capability, improper tool permissions introduce security risks such as indirect prompt injection and tool misuse. We characterize these failures as unbalanced tool-driven agency. Agents

medium relevance tool
Paper 2601.10338v1

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

skills contain at least one vulnerability, spanning 14 distinct patterns across four categories: prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration (13.3%) and privilege escalation

medium relevance survey
Paper 2601.10156v1

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

percent on average and improves benign task completion by approximately 10 percent under prompt injection attacks

medium relevance tool
Paper 2601.09923v2

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly

medium relevance tool
Paper 2601.07263v1

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

broadened the attack surface. While prior research has focused on model threats such as prompt injection and backdoors, the risks of social engineering remain largely unexplored. We present the first

high relevance attack
Paper 2601.07185v1

Defenses Against Prompt Attacks Learn Surface Heuristics

test-time accuracy drops of up to \textbf{40\%}. These findings suggest that current prompt-injection defenses frequently respond to attack-like surface patterns rather than the underlying intent

high relevance attack
Paper 2601.07853v1

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental

medium relevance benchmark
Paper 2601.05059v1

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

smooth transitions and audio/visual alignment; (ii) a personalization mechanism based on role definition and prompt injection for tailored outputs (marketing, training, regulatory); (iii) a cost efficient e2e pipeline strategy balancing

medium relevance benchmark
Paper 2601.04583v1

Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries

threat model tailored to agent-driven transaction pipelines that captures risks ranging from prompt injection and policy misuse to key compromise, adversarial execution dynamics, and multi-agent collusion

medium relevance survey
Paper 2601.01972v4

Hidden State Poisoning Attacks against Mamba-based Language Models

also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. We further show that the theoretical and empirical findings extend

high relevance attack
Paper 2601.01972v3

Hidden State Poisoning Attacks against Mamba-based Language Models

also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. Finally, our interpretability study reveals patterns in Mamba's hidden

high relevance attack
Paper 2601.01241v1

MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools

agents raise new security risks: tool executions can introduce runtime-only behaviors, including prompt injection and unintended exposure of external inputs (e.g., environment secrets or local files). While existing scanners

medium relevance benchmark
Paper 2512.24415v1

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

trust in agentic workflows. We present a cross-domain benchmark of profit-seeking direct prompt injection in customer-service interactions, spanning 10 service domains and 100 realistic attack scripts grouped

high relevance benchmark
Paper 2601.00867v1

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models

systems, and infrastructure management. Current adversarial testing paradigms focus predominantly on technical attack vectors: prompt injection, jailbreaking, and data exfiltration. We argue this focus is catastrophically incomplete. LLMs, trained

medium relevance survey
Previous Page 11 of 15 Next