AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 621–640 of 2,583 papers

Survey LOW

Scalable Classification of Course Information Sheets Using Large Language Models: A Reusable Institutional Method for Academic Quality Assurance

Brecht Verbeken, Joke Van den Broeck, Inge De Cleyn +4 more

Purpose: Higher education institutions face increasing pressure to audit course designs for generative AI (GenAI) integration. This paper presents an...

2 months ago cs.LG PDF

Benchmark MEDIUM

CTI-REALM: Benchmark to Evaluate Agent Performance on Security Detection Rule Generation Capabilities

Arjun Chakraborty, Sandra Ho, Adam Cook +1 more

CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat...

2 months ago cs.CR PDF

Benchmark LOW

Visual-ERM: Reward Modeling for Visual Equivalence

Ziyu Liu, Shengyuan Ding, Xinyu Fang +7 more

Vision-to-code tasks require models to reconstruct structured visual inputs, such as charts, tables, and SVGs, into executable or structured...

2 months ago cs.CV cs.AI PDF

Attack MEDIUM

MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models

Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy

Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains...

2 months ago cs.SE PDF

Attack MEDIUM

Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Jianwei Li, Jung-Eun Kim

Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces...

2 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Chenlong Yin, Runpeng Geng, Yanting Wang +1 more

Prompt injection poses serious security risks to real-world LLM applications, particularly autonomous agents. Although many defenses have been...

2 months ago cs.LG cs.CR PDF

Defense MEDIUM

FraudFox: Adaptable Fraud Detection in the Real World

Matthew Butler, Yi Fan, Christos Faloutsos

The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the...

2 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He +5 more

Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where...

2 months ago cs.CV cs.CR PDF

Attack HIGH

SLICE: Semantic Latent Injection via Compartmentalized Embedding for Image Watermarking

Zheng Gao, Yifan Yang, Xiaoyu Li +4 more

Watermarking the initial noise of diffusion models has emerged as a promising approach for image provenance, but content-independent noise patterns...

2 months ago cs.CV cs.CR cs.LG PDF

Attack HIGH

Colluding LoRA: A Composite Attack on LLM Safety Alignment

Sihao Ding

We introduce Colluding LoRA (CoLoRA), an attack in which each adapter appears benign and plausibly functional in isolation, yet their linear...

2 months ago cs.CR cs.LG PDF

Defense MEDIUM

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Zonghao Ying, Xiao Yang, Siyang Wu +7 more

The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape....

2 months ago cs.CR PDF

Tool MEDIUM

ChainFuzzer: Greybox Fuzzing for Workflow-Level Multi-Tool Vulnerabilities in LLM Agents

Jiangrong Wu, Zitong Yao, Yuhong Nan +1 more

Tool-augmented LLM agents increasingly rely on multi-step, multi-tool workflows to complete real tasks. This design expands the attack surface,...

2 months ago cs.SE cs.CR PDF

Attack MEDIUM

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

Xiangkui Cao, Jie Zhang, Meina Kan +2 more

Large Vision-Language Models (LVLMs) have shown remarkable potential across a wide array of vision-language tasks, leading to their adoption in...

2 months ago cs.CV PDF

Attack HIGH

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

Darren Cheng, Wen-Kwang Tsao

Prompt injection remains one of the most practical attack vectors against LLM-integrated applications. We replicate the Microsoft LLMail-Inject...

2 months ago cs.CR cs.AI PDF

Benchmark HIGH

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Siddharth Srikanth, Freddie Liang, Sophie Hsu +9 more

Vision-Language-Action (VLA) models have significant potential to enable general-purpose robotic systems for a range of vision-language tasks....

2 months ago cs.RO cs.AI cs.CL PDF

Attack HIGH

Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache

Xinhai Wang, Shaopeng Fu, Shu Yang +3 more

Suffix jailbreak attacks serve as a systematic method for red-teaming Large Language Models (LLMs) but suffer from prohibitive computational costs,...

2 months ago cs.CR cs.AI PDF

Attack HIGH

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

Davi Bonetto

State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a...

2 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

Security Considerations for Artificial Intelligence Agents

Ninghui Li, Kaiyuan Zhang, Kyle Polley +1 more

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and...

2 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

Alexandre Le Mercier, Thomas Demeester, Chris Develder

State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while...

2 months ago cs.CL PDF

Attack MEDIUM

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai +6 more

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every...

2 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial