AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 301–320 of 890 papers

Clear filters

Attack HIGH

ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models

Mitchell Piehl, Zhaohan Xi, Zuobin Xiong +2 more

Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent...

2 months ago cs.LG PDF

Attack HIGH

Boundary Point Jailbreaking of Black-Box LLMs

Xander Davies, Giorgi Giglemiani, Edmund Lau +3 more

Frontier LLMs are safeguarded against attempts to extract harmful information via adversarial prompts known as "jailbreaks". Recently, defenders have...

2 months ago cs.LG PDF

Attack HIGH

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

Lukas Struppek, Adam Gleave, Kellin Pelrine

As the capabilities of large language models continue to advance, so does their potential for misuse. While closed-source models typically rely on...

2 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

In Chong Choi, Jiacheng Zhang, Feng Liu +1 more

Multi-turn jailbreak attacks are effective against text-only large language models (LLMs) by gradually introducing malicious content across turns....

2 months ago cs.CV PDF

Attack HIGH

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

Xiaojun Jia, Jie Liao, Simeng Qin +5 more

Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented...

2 months ago cs.CR cs.AI PDF

Attack HIGH

AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks

Yuqi Jia, Ruiqi Wang, Xilong Wang +2 more

Prompt injection attacks insert malicious instructions into an LLM's input to steer it toward an attacker-chosen task instead of the intended one....

3 months ago cs.CR PDF

Attack HIGH

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Ruomeng Ding, Yifei Pang, He Sun +3 more

Evaluation and alignment pipelines for large language models increasingly rely on LLM-based judges, whose behavior is guided by natural-language...

3 months ago cs.CR cs.AI cs.CL PDF

Benchmark HIGH

Execution-State-Aware LLM Reasoning for Automated Proof-of-Vulnerability Generation

Haoyu Li, Xijia Che, Yanhao Wang +2 more

Proof-of-Vulnerability (PoV) generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false...

3 months ago cs.SE cs.CR PDF

Attack HIGH

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Weiming Song, Xuan Xie, Ruiping Yin

Large language models (LLMs) remain vulnerable to jailbreak prompts that elicit harmful or policy-violating outputs, while many existing defenses...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Backdoor Attacks on Contrastive Continual Learning for IoT Systems

Alfous Tim, Kuniyilh Simi D

The Internet of Things (IoT) systems increasingly depend on continual learning to adapt to non-stationary environments. These environments can...

3 months ago cs.LG cs.CR cs.NI PDF

Attack HIGH

Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks

Osama Zafar, Shaojie Zhan, Tianxi Ji +1 more

In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable...

3 months ago cs.CR PDF

Benchmark HIGH

Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

André Storhaug, Jiamou Sun, Jingyue Li

Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at...

3 months ago cs.SE cs.AI cs.CR PDF

Attack HIGH

Sparse Autoencoders are Capable LLM Jailbreak Mitigators

Yannick Assogba, Jacopo Cortellazzi, Javier Abad +3 more

Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based...

3 months ago cs.CR cs.CL cs.LG PDF

Other HIGH

Abstractive Red-Teaming of Language Model Character

Nate Rahn, Allison Qi, Avery Griffin +3 more

We want language model assistants to conform to a character specification, which asserts how the model should act across diverse user interactions....

3 months ago cs.LG PDF

Tool HIGH

MalTool: Malicious Tool Attacks on LLM Agents

Yuepeng Hu, Yuqi Jia, Mengyuan Li +2 more

In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user installs the tool and the LLM agent selects...

3 months ago cs.CR PDF

Attack HIGH

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Dong Yan, Jian Liang, Ran He +1 more

Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text...

3 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis

Jailbreaking large language models (LLMs) has emerged as a critical security challenge with the widespread deployment of conversational AI systems....

3 months ago cs.CR cs.CL PDF

Attack HIGH

Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

J Alex Corll

Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is...

3 months ago cs.CR cs.AI PDF

Defense HIGH

VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection

Samal Mukhtar, Yinghua Yao, Zhu Sun +3 more

Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations...

3 months ago cs.SE cs.AI cs.CR PDF

Attack HIGH

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang, Haiping Huang, Yanjun Zhang +3 more

Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor...

3 months ago cs.CR cs.SE PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial