AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 201–220 of 222 papers

Clear filters

Defense MEDIUM

SafeMT: Multi-turn Safety for Multimodal Language Models

Han Zhu, Juntao Dai, Jiaming Ji +8 more

With the widespread use of multi-modal Large Language models (MLLMs), safety issues have become a growing concern. Multi-turn dialogues, which are...

7 months ago cs.CL cs.AI PDF

Defense MEDIUM

TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection

Jiahao Liu, Bonan Ruan, Xianglin Yang +5 more

LLM-based agents have demonstrated promising adaptability in real-world applications. However, these agents remain vulnerable to a wide range of...

7 months ago cs.CR PDF

Defense MEDIUM

CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

Zhuochen Yang, Kar Wai Fok, Vrizlynn L. L. Thing

Large language models have gained widespread attention recently, but their potential security vulnerabilities, especially privacy leakage, are also...

7 months ago cs.CR PDF

Defense MEDIUM

Path Drift in Large Reasoning Models:How First-Person Commitments Override Safety

Yuyi Huang, Runzhe Zhan, Lidia S. Chao +2 more

As large language models (LLMs) are increasingly deployed for complex reasoning tasks, Long Chain-of-Thought (Long-CoT) prompting has emerged as a...

7 months ago cs.CL PDF

Defense MEDIUM

VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search

MingSheng Li, Guangze Zhao, Sichen Liu

Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a...

7 months ago cs.AI cs.CR PDF

Defense MEDIUM

From Defender to Devil? Unintended Risk Interactions Induced by LLM Defenses

Xiangtao Meng, Tianshuo Cong, Li Wang +4 more

Large Language Models (LLMs) have shown remarkable performance across various applications, but their deployment in real-world settings faces several...

7 months ago cs.CR PDF

Defense MEDIUM

From Description to Detection: LLM based Extendable O-RAN Compliant Blind DoS Detection in 5G and Beyond

Thusitha Dayaratne, Ngoc Duy Pham, Viet Vo +5 more

The quality and experience of mobile communication have significantly improved with the introduction of 5G, and these improvements are expected to...

7 months ago cs.CR cs.ET cs.LG PDF

Defense MEDIUM

P2P: A Poison-to-Poison Remedy for Reliable Backdoor Defense in LLMs

Shuai Zhao, Xinyi Wu, Shiqian Zhao +4 more

During fine-tuning, large language models (LLMs) are increasingly vulnerable to data-poisoning backdoor attacks, which compromise their reliability...

7 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models

Anindya Sundar Das, Kangjie Chen, Monowar Bhuyan

Pre-trained language models have achieved remarkable success across a wide range of natural language processing (NLP) tasks, particularly when...

7 months ago cs.CL cs.LG PDF

Defense MEDIUM

Read the Scene, Not the Script: Outcome-Aware Safety for LLMs

Rui Wu, Yihao Quan, Zeru Shi +3 more

Safety-aligned Large Language Models (LLMs) still show two dominant failure modes: they are easily jailbroken, or they over-refuse harmless inputs...

7 months ago cs.CL cs.LG PDF

Defense MEDIUM

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Lesly Miculicich, Mihir Parmar, Hamid Palangi +4 more

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These...

7 months ago cs.SE cs.AI cs.CR PDF

Defense MEDIUM

UpSafe$^\circ$C: Upcycling for Controllable Safety in Large Language Models

Yuhao Sun, Zhuoer Xu, Shiwen Cui +4 more

Large Language Models (LLMs) have achieved remarkable progress across a wide range of tasks, but remain vulnerable to safety risks such as harmful...

7 months ago cs.AI cs.CR cs.LG PDF

Defense MEDIUM

Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense

Guobin Shen, Dongcheng Zhao, Haibo Tong +3 more

Ensuring Large Language Model (LLM) safety remains challenging due to the absence of universal standards and reliable content validators, making it...

7 months ago cs.AI PDF

Defense MEDIUM

A Hybrid CAPTCHA Combining Generative AI with Keystroke Dynamics for Enhanced Bot Detection

Ayda Aghaei Nia

Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHAs) are a foundational component of web security, yet traditional...

7 months ago cs.CR cs.AI PDF

Defense MEDIUM

DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models

Zherui Li, Zheng Nie, Zhenhong Zhou +7 more

The rapid advancement of Diffusion Large Language Models (dLLMs) introduces unprecedented vulnerabilities that are fundamentally distinct from...

7 months ago cs.CL cs.AI PDF

Defense MEDIUM

Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents

Gauri Kholkar, Ratinder Ahuja

As autonomous AI agents are used in regulated and safety-critical settings, organizations need effective ways to turn policy into enforceable...

7 months ago cs.CL cs.AI PDF

Defense MEDIUM

Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence

Yuqiao Meng, Luoxi Tang, Feiyang Yu +4 more

Large language models (LLMs) are increasingly used to help security analysts manage the surge of cyber threats, automating tasks from vulnerability...

7 months ago cs.CR cs.AI PDF

Defense MEDIUM

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search

Zeyu Shen, Basileal Imana, Tong Wu +3 more

Retrieval-Augmented Generation (RAG) enhances Large Language Models by grounding their outputs in external documents. These systems, however, remain...

7 months ago cs.CR cs.AI PDF

Defense MEDIUM

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity

Charles E. Gagnon, Steven H. H. Ding, Philippe Charland +1 more

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying...

7 months ago cs.AI cs.CR cs.SE PDF

Defense MEDIUM

The Rogue Scalpel: Activation Steering Compromises LLM Safety

Anton Korznikov, Andrey Galichin, Alexey Dontsov +3 more

Activation steering is a promising technique for controlling LLM behavior by adding semantically meaningful vectors directly into a model's hidden...

7 months ago cs.LG cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial