AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 341–360 of 725 papers

Clear filters

Attack HIGH

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis

Jailbreaking large language models (LLMs) has emerged as a critical security challenge with the widespread deployment of conversational AI systems....

3 months ago cs.CR cs.CL PDF

Attack MEDIUM

Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise

Abhishek Saini, Haolin Jiang, Hang Liu

The deployment of large language models (LLMs) on third-party devices requires new ways to protect model intellectual property. While Trusted...

3 months ago cs.CR cs.AR PDF

Attack HIGH

Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

J Alex Corll

Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang, Haiping Huang, Yanjun Zhang +3 more

Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor...

3 months ago cs.CR cs.SE PDF

Attack HIGH

When Skills Lie: Hidden-Comment Injection in LLM Agents

Qianli Wang, Boyang Ma, Minghui Xu +1 more

LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this...

3 months ago cs.CR PDF

Attack HIGH

Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction

Tri Nguyen, Huy Hoang Bao Le, Lohith Srikanth Pentapalli +2 more

Detecting jailbreak attempts in clinical training large language models (LLMs) requires accurate modeling of linguistic deviations that signal unsafe...

3 months ago cs.AI cs.LG PDF

Attack MEDIUM

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

Zhenyu Xu, Victor S. Sheng

Protecting the intellectual property of large language models (LLMs) is a critical challenge due to the proliferation of unauthorized derivative...

3 months ago cs.CR cs.AI PDF

Attack HIGH

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Georgios Syros, Evan Rose, Brian Grinstead +4 more

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and...

3 months ago cs.CR cs.AI PDF

Attack HIGH

One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning

Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi

Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization,...

3 months ago cs.CR cs.LG PDF

Attack HIGH

Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks

Yu Yan, Sheng Sun, Shengjia Cheng +3 more

Vision-Language Models (VLMs) with multimodal reasoning capabilities are high-value attack targets, given their potential for handling complex...

3 months ago cs.CR cs.AI PDF

Attack HIGH

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath, Atharv Ramesh

AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

Suraj Ranganath, Atharv Ramesh

AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We...

3 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing

Jona te Lintelo, Lichao Wu, Stjepan Picek

The rapid adoption of Mixture-of-Experts (MoE) architectures marks a major shift in the deployment of Large Language Models (LLMs). MoE LLMs improve...

3 months ago cs.CR PDF

Attack HIGH

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

Yanzhang Fu, Zizheng Guo, Jizhou Luo

Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model...

3 months ago cs.LG cs.CR PDF

Attack HIGH

Retrieval Pivot Attacks in Hybrid RAG: Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion

Scott Thornton

Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We...

3 months ago cs.CR cs.IR cs.LG PDF

Attack MEDIUM

LLMs + Security = Trouble

Benjamin Livshits

We argue that when it comes to producing secure code with AI, the prevailing "fighting fire with fire" approach -- using probabilistic AI-based...

3 months ago cs.CR cs.AI cs.SE PDF

Attack HIGH

Evasion of IoT Malware Detection via Dummy Code Injection

Sahar Zargarzadeh, Mohammad Islam

The Internet of Things (IoT) has revolutionized connectivity by linking billions of devices worldwide. However, this rapid expansion has also...

3 months ago cs.CR cs.LG PDF

Attack LOW

Test vs Mutant: Adversarial LLM Agents for Robust Unit Test Generation

Pengyu Chang, Yixiong Fang, Silin Chen +3 more

Software testing is a critical, yet resource-intensive phase of the software development lifecycle. Over the years, various automated tools have been...

3 months ago cs.SE PDF

Attack HIGH

Robustness of Vision Language Models Against Split-Image Harmful Input Attacks

Md Rafi Ur Rashid, MD Sadik Hossain Shanto, Vishnu Asutosh Dasu +1 more

Vision-Language Models (VLMs) are now a core part of modern AI. Recent work proposed several visual jailbreak attacks using single/ holistic images....

3 months ago cs.CV cs.AI PDF

Attack HIGH

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

Minbeom Kim, Mihir Parmar, Phillip Wallis +5 more

AI agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious...

3 months ago cs.CR cs.LG stat.ME PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial