Search: prompt injection | AI Threat Intelligence

Severity:

298 results in 135ms

Paper 2512.01295v2

2025-12-01

Systems Security Foundations for Agentic Computing

third-party servers. For example, a malicious adversary can cause data exfiltration by executing prompt injection attacks, as well as other unwarranted behavior. These security concerns have recently motivated researchers

medium relevance tool

Paper 2512.00742v1

2025-11-30

On the Regulatory Potential of User Interfaces for AI Agent Governance

consequential risks. Prior proposals for governing AI agents primarily target system-level safeguards (e.g., prompt injection monitors) or agent infrastructure (e.g., agent IDs). In this work, we explore a complementary

medium relevance attack

Paper 2511.19483v1

2025-11-23

Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation

become a core challenge restricting system practicality. Existing approaches generally rely on full-prompt injection or static semantic retrieval, facing issues including semantic disconnection between user queries and tool descriptions

medium relevance tool

Paper 2511.19477v1

2025-11-22

Building Browser Agents: Architecture, Security, and Practical Solutions

performance; architectural decisions determine success or failure. Security analysis of real-world incidents reveals prompt injection attacks make general-purpose autonomous operation fundamentally unsafe. The paper argues against developing general

medium relevance benchmark

Paper 2511.15203v1

2025-11-19

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks

high relevance survey

Paper 2511.12423v1

2025-11-16

GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs

vulnerabilities: GNNs are sensitive to structural perturbations, while LLM-derived features are vulnerable to prompt injection and adversarial phrasing. While existing adversarial attacks largely perturb structure or text independently

high relevance attack

Paper 2511.06212v1

2025-11-09

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

expands the attack surface, putting entire networks at risk by introducing vulnerabilities such as prompt injection and data poisoning. In this work, we attack an LLM-based IoT attack analysis

high relevance tool

Paper 2511.05919v2

2025-11-08

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

attacks. Here, we propose the first principled attack evaluation on LLM factual memory under prompt injection via Xmera, our novel, theory-grounded MitM framework. By perturbing the input given

high relevance attack

Paper 2511.05867v3

2025-11-08

Can LLM Infer Risk Information From MCP Server System Logs?

when the MCP server is compromised or untrustworthy. While prior benchmarks primarily focus on prompt injection attacks or analyze the vulnerabilities of LLM-MCP interaction trajectories, limited attention has been

medium relevance tool

Paper 2511.03434v1

2025-11-05

Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond

assumptions, attack surfaces, and design trade-offs, with particular emphasis on LLM-specific fragilities-prompt injection, sycophancy/nudge-susceptibility, hallucination, deception, and misalignment-that render purely reputational or claim-only approaches brittle

medium relevance attack

Paper 2511.03247v1

2025-11-05

Death by a Thousand Prompts: Open Model Vulnerability Analysis

adversarial testing, we measured each model's resilience against single-turn and multi-turn prompt injection and jailbreak attacks. Our findings reveal pervasive vulnerabilities across all tested models, with multi

high relevance attack

Paper 2510.19169v2

2025-10-22

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

safety violations such as harmful or explicit text generation, (2) model-manipulation attacks including prompt injection, jailbreaks, and code-interpreter abuse, and (3) data leakage involving sensitive or private information

medium relevance tool

Paper 2510.16381v1

2025-10-18

ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents

models, while exhibiting perfect determinism, enhanced stability against input perturbations, and inherent immunity to prompt injection attacks. By generating decisions grounded in symbolic reasoning, ATA offers a practical and controllable

medium relevance benchmark

Paper 2510.13351v1

2025-10-15

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems

extensive, multi-modal dataset covering four safety dimensions: toxicity, sexism, data privacy, and prompt injection. Our teacher-assisted annotation pipeline leverages reasoning and explanation traces to generate high-fidelity, context

medium relevance tool

Paper 2510.08917v1

2025-10-10

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in domains such as corporate security policy, they could

medium relevance attack

Paper 2510.01586v1

2025-10-02

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning

role coordination, but their openness and interaction complexity also expose them to jailbreak, prompt-injection, and adversarial collaboration. Existing defenses fall into two lines: (i) self-verification that asks each

medium relevance attack

Paper 2509.26584v1

2025-09-30

Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models

concerns regarding security and fairness. Beyond known attack vectors such as data poisoning and prompt injection, LLMs are also vulnerable to fairness bugs. These refer to unintended behaviors influenced

medium relevance benchmark

Paper 2509.25705v1

2025-09-30

How Diffusion Models Memorize

under memorization due to classifier-free guidance amplifying predictions and inducing overestimation; (ii) memorized prompts inject training images into noise predictions, forcing latent trajectories to converge and steering denoising toward

low relevance other

Paper 2509.23519v2

2025-09-27

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

documents. These systems, however, remain vulnerable to attacks on the retrieval corpus, such as prompt injection. RAG-based search systems (e.g., Google's Search AI Overview) present an interesting setting

medium relevance defense

Paper 2603.17239v1

2026-03-18

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control Injection (LPCI) - for which no automated red-teaming instrument existed. We present LAAF (Logic-layer Automated

high relevance attack

Previous Page 13 of 15 Next