AI Security Research

2,589+ academic papers on AI security, attacks, and defenses

Total

2,589

Attack

998

Benchmark

740

Defense

355

Tool

276

Survey

147

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 881–900 of 1,931 papers

Clear filters

Benchmark LOW

Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm

Anna Babarczy, Andras Lukacs, Peter Vedres +1 more

The study explores whether current Large Language Models (LLMs) exhibit Theory of Mind (ToM) capabilities -- specifically, the ability to infer...

2 months ago cs.CL cs.AI PDF

Attack MEDIUM

AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly

Diego Soi, Silvia Lucia Sanna, Lorenzo Pisu +2 more

In recent years, stealthy Android malware has increasingly adopted sophisticated techniques to bypass automatic detection mechanisms and harden...

2 months ago cs.CR PDF

Tool HIGH

PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks

Phan The Duy, Nghi Hoang Khoa, Nguyen Tran Anh Quan +3 more

The increasing deployment of Federated Learning (FL) in Intrusion Detection Systems (IDS) introduces new challenges related to data privacy,...

2 months ago cs.CR cs.AI PDF

Defense LOW

Mining Type Constructs Using Patterns in AI-Generated Code

Imgyeong Lee, Tayyib Ul Hassan, Abram Hindle

Artificial Intelligence (AI) increasingly automates various parts of the software development tasks. Although AI has enhanced the productivity of...

2 months ago cs.SE PDF

Attack HIGH

TFL: Targeted Bit-Flip Attack on Large Language Model

Jingkai Guo, Chaitali Chakrabarti, Deliang Fan

Large language models (LLMs) are increasingly deployed in safety and security critical applications, raising concerns about their robustness to model...

2 months ago cs.CR cs.CL cs.LG PDF

Attack HIGH

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Manuel Wirth

As Large Language Models (LLMs) are increasingly integrated into automated decision-making pipelines, specifically within Human Resources (HR), the...

2 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Zachary Coalson, Bo Fang, Sanghyun Hong

Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in...

2 months ago cs.LG cs.CR PDF

Tool LOW

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

Leon Staufer, Kevin Feng, Kevin Wei +6 more

Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these...

2 months ago cs.CY cs.AI PDF

Benchmark MEDIUM

What Makes a Good LLM Agent for Real-world Penetration Testing?

Gelei Deng, Yi Liu, Yuekang Li +5 more

LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28...

2 months ago cs.CR cs.SE PDF

Attack LOW

Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge

Wyatt Benno, Alberto Centelles, Antoine Douchet +1 more

We present Jolt Atlas, a zero-knowledge machine learning (zkML) framework that extends the Jolt proving system to model inference. Unlike zkVMs...

2 months ago cs.CR cs.AI PDF

Survey MEDIUM

What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?

Boyang Ma, Hechuan Guo, Peizhuo Lv +5 more

Embodied AI systems (e.g., autonomous vehicles, service robots, and LLM-driven interactive agents) are rapidly transitioning from controlled...

2 months ago cs.CR cs.AI PDF

Benchmark LOW

ReIn: Conversational Error Recovery with Reasoning Inception

Takyoung Kim, Jinseok Nam, Chandrayee Basu +5 more

Conversational agents powered by large language models (LLMs) with tool integration achieve strong performance on fixed task-oriented dialogue...

2 months ago cs.CL cs.AI PDF

Defense MEDIUM

Fail-Closed Alignment for Large Language Models

Zachary Coalson, Beth Sohler, Aiden Gabriel +1 more

We identify a structural weakness in current large language model (LLM) alignment: modern refusal mechanisms are fail-open. While existing approaches...

2 months ago cs.LG cs.CR PDF

Attack HIGH

Automating Agent Hijacking via Structural Template Injection

Xinhao Deng, Jiaqing Wu, Miao Chen +3 more

Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution...

2 months ago cs.AI cs.LG PDF

Tool MEDIUM

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Arnold Cartagena, Ariane Teixeira

Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that...

2 months ago cs.AI cs.SE PDF

Attack MEDIUM

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

Justin Albrethsen, Yash Datta, Kunal Kumar +1 more

While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of...

2 months ago cs.AI cs.ET cs.LG PDF

Defense MEDIUM

NeST: Neuron Selective Tuning for LLM Safety

Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami +1 more

Safety alignment is essential for the responsible deployment of large language models (LLMs). Yet, existing approaches often rely on heavyweight...

2 months ago cs.CR cs.LG PDF

Benchmark HIGH

IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

Priyaranjan Pattnayak, Sanchari Chowdhuri

Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities...

2 months ago cs.AI cs.CL PDF

Benchmark MEDIUM

Large-scale online deanonymization with LLMs

Simon Lermen, Daniel Paleka, Joshua Swanson +3 more

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News...

2 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Policy Compiler for Secure Agentic Systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi +2 more

LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval...

2 months ago cs.CR cs.AI cs.MA PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial