AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 345 papers

Clear filters

Defense LOW

Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents

Eranga Bandara, Ross Gore, Asanga Gunaratna +12 more

The rapid deployment of autonomous AI agents across enterprise, healthcare, and safety-critical environments has created a fundamental governance...

2 weeks ago cs.AI PDF

Defense HIGH

PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

Mohamed Taoufik Kaouthar El Idrissi, Edward Zulkoski, Mohammad Hamdaqa

Code understanding models increasingly rely on pretrained language models (PLMs) and graph neural networks (GNNs), which capture complementary...

2 weeks ago cs.SE cs.LG PDF

Defense MEDIUM

One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations

Ravikumar Balakrishnan, Sanket Mendapara

Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power...

2 weeks ago cs.CV PDF

Defense MEDIUM

Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models

Nay Myat Min, Long H. Pham, Jun Sun

Large language models deployed at runtime can misbehave in ways that clean-data validation cannot anticipate: training-time backdoors lie dormant...

2 weeks ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Kaisheng Fan, Weizhe Zhang, Yishu Gao +2 more

Defending against backdoor attacks in large language models remains a critical practical challenge. Existing defenses mitigate these threats but...

2 weeks ago cs.CR cs.AI PDF

Defense LOW

Disagreement as Signals: Dual-view Calibration for Sequential Recommendation Denoising

Sijia Li, Min Gao, Zongwei Wang +3 more

Sequential recommendation seeks to model the evolution of user interests by capturing temporal user intent and item-level transition patterns....

2 weeks ago cs.IR PDF

Defense HIGH

Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

Zhaohui Geoffrey Wang

Automated code vulnerability detection is critical for software security, yet existing approaches face a fundamental trade-off between detection...

2 weeks ago cs.CR cs.LG cs.SE PDF

Defense MEDIUM

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski +6 more

Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities...

2 weeks ago cs.CR cs.LG PDF

Defense MEDIUM

SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs

Chao Pan, Yu Wu, Xin Yao

Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion...

2 weeks ago cs.CR cs.AI cs.LG PDF

Defense HIGH

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

Ronghao Ni, Mihai Christodorescu, Limin Jia

The rapidly evolving Node$.$js ecosystem currently includes millions of packages and is a critical part of modern software supply chains, making...

2 weeks ago cs.CR cs.AI cs.SE PDF

Defense MEDIUM

Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection

Divyesh Gabbireddy, Suman Saha

Cross-site scripting (XSS) remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious...

3 weeks ago cs.CR cs.LG cs.SE PDF

Defense MEDIUM

Malicious ML Model Detection by Learning Dynamic Behaviors

Sarang Nambiar, Dhruv Pradhan, Ezekiel Soremekun

Pre-trained machine learning models (PTMs) are commonly provided via Model Hubs (e.g., Hugging Face) in standard formats like Pickles to facilitate...

3 weeks ago cs.CR cs.SE PDF

Defense MEDIUM

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

Kun Wang, Cheng Qian, Miao Yu +6 more

Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is...

3 weeks ago cs.CR cs.AI PDF

Defense MEDIUM

Mechanistic Anomaly Detection via Functional Attribution

Hugo Lyons Keenan, Christopher Leckie, Sarah Erfani

We can often verify the correctness of neural network outputs using ground truth labels, but we cannot reliably determine whether the output was...

3 weeks ago cs.LG cs.CR PDF

Defense MEDIUM

Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs

Ziyang Liu

Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such...

3 weeks ago cs.CR cs.AI PDF

Defense LOW

Architectural Design Decisions in AI Agent Harnesses

Hu Wei

AI agent systems increasingly rely on reusable non-LLM engineering infrastructure that packages tool mediation, context handling, delegation, safety...

3 weeks ago cs.AI PDF

Defense MEDIUM

Owner-Harm: A Missing Threat Model for AI Agent Safety

Dongcheng Zhang, Yiqing Jiang

Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a...

3 weeks ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

TitanCA: Lessons from Orchestrating LLM Agents to Discover 100+ CVEs

Ting Zhang, Yikun Li, Chengran Yang +15 more

Software vulnerabilities remain one of the most persistent threats to modern digital infrastructure. While static application security testing (SAST)...

3 weeks ago cs.CR PDF

Defense MEDIUM

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni +1 more

Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and...

3 weeks ago cs.AI cs.MA PDF

Defense LOW

Golden Handcuffs make safer AI agents

Aram Ebtekar, Michael K. Cohen

Reinforcement learners can attain high reward through novel unintended strategies. We study a Bayesian mitigation for general environments: we expand...

3 weeks ago cs.LG cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial