AI Security Research

2,589+ academic papers on AI security, attacks, and defenses

Total

2,589

Attack

998

Benchmark

740

Defense

355

Tool

276

Survey

147

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1201–1220 of 1,928 papers

Clear filters

Benchmark LOW

StepShield: When, Not Whether to Intervene on Rogue Agents

Gloria Felicia, Michael Eniolade, Jinfeng He +4 more

Existing agent safety benchmarks report binary accuracy, conflating early intervention with post-mortem analysis. A detector that flags a violation...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

Xiaogeng Liu, Xinyan Wang, Yechao Zhang +5 more

Large reasoning models (LRMs) extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of...

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

Xiaoyu Xu, Minxin Du, Kun Fang +6 more

Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful...

3 months ago cs.CL cs.AI cs.CR PDF

Attack MEDIUM

Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents

Mingyang Liao, Yichen Wan, shuchen wu +6 more

LLM-based role-playing has rapidly improved in fidelity, yet stronger adherence to persona constraints commonly increases vulnerability to jailbreak...

3 months ago cs.AI PDF

Attack HIGH

ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

Ningyuan He, Ronghong Huang, Qianqian Tang +3 more

In-context learning (ICL) has become a powerful, data-efficient paradigm for text classification using large language models. However, its robustness...

3 months ago cs.CR PDF

Attack MEDIUM

RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing

Wenhui Zhang, Huiyu Xu, Zhibo Wang +4 more

Recent advancements in multi-model AI systems have leveraged LLM routers to reduce computational cost while maintaining response quality by assigning...

3 months ago cs.CR PDF

Benchmark MEDIUM

The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation

Devanshu Sahoo, Manish Prasad, Vasudev Majhi +5 more

The rapid integration of Large Language Models (LLMs) into educational assessment rests on the unverified assumption that instruction following...

3 months ago cs.CL cs.AI cs.ET PDF

Tool MEDIUM

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Xiang Zheng, Yutao Wu, Hanxun Huang +5 more

Autonomous code agents built on large language models are reshaping software and AI development through tool use, long-horizon reasoning, and...

3 months ago cs.AI PDF

Attack MEDIUM

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Alvi Md Ishmam, Najibul Haque Sarker, Zaber Ibn Abdul Hakim +1 more

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to...

3 months ago cs.CV PDF

Attack MEDIUM

Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks

Arther Tian, Alex Ding, Frank Chen +2 more

Decentralized large language model inference networks require lightweight mechanisms to reward high quality outputs under heterogeneous latency and...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Jarrod Barnes

As large language models (LLMs) improve, so do their offensive applications: frontier agents now generate working exploits for under $50 in compute...

3 months ago cs.AI PDF

Attack MEDIUM

Diversifying Toxicity Search in Large Language Models Through Speciation

Onkar Shelar, Travis Desell

Evolutionary prompt search is a practical black-box approach for red teaming large language models (LLMs), but existing methods often collapse onto a...

3 months ago cs.NE q-bio.PE PDF

Benchmark LOW

ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code

Mingqiao Mo, Yunlong Tan, Hao Zhang +2 more

Large language models (LLMs) have achieved remarkable progress in code generation, yet their potential for software protection remains largely...

3 months ago cs.CL PDF

Attack HIGH

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

Xingwei Lin, Wenhao Lin, Sicong Cao +4 more

Multi-turn jailbreak attacks have emerged as a critical threat to Large Language Models (LLMs), bypassing safety mechanisms by progressively...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense

Yizhong Ding

Webshells remain a primary foothold for attackers to compromise servers, particularly within PHP ecosystems. However, existing detection mechanisms...

3 months ago cs.CR cs.AI PDF

Defense MEDIUM

Eliciting Least-to-Most Reasoning for Phishing URL Detection

Holly Trikilis, Pasindu Marasinghe, Fariza Rashid +1 more

Phishing continues to be one of the most prevalent attack vectors, making accurate classification of phishing URLs essential. Recently, large...

3 months ago cs.CR cs.AI PDF

Survey MEDIUM

Securing AI Agents in Cyber-Physical Systems: A Survey of Environmental Interactions, Deepfake Threats, and Defenses

Mohsen Hatami, Van Tuan Pham, Hozefa Lakadawala +1 more

The increasing integration of AI agents into cyber-physical systems (CPS) introduces new security risks that extend beyond traditional cyber or...

3 months ago cs.CR cs.DC PDF

Attack HIGH

Membership Inference Attacks Against Fine-tuned Diffusion Language Models

Yuetian Chen, Kaiyuan Zhang, Yuntao Du +5 more

Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction....

3 months ago cs.LG cs.AI PDF

Benchmark LOW

FFE-Hallu:Hallucinations in Fixed Figurative Expressions:Benchmark of Idioms and Proverbs in the Persian Language

Faezeh Hosseini, Mohammadali Yousefzadeh, Yadollah Yaghoobzadeh

Figurative language, particularly fixed figurative expressions (FFEs) such as idioms and proverbs, poses persistent challenges for large language...

3 months ago cs.CL PDF

Attack HIGH

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

Md Tasnim Jawad, Mingyan Xiao, Yanzhao Wu

With the widespread adoption of Large Language Models (LLMs) and increasingly stringent privacy regulations, protecting data privacy in LLMs has...

3 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial