AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 21–40 of 1,207 papers

Clear filters

Defense MEDIUM

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

Siyuan Li, Aodu Wulianghai, Xi Lin +6 more

The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from...

5 days ago cs.CL PDF

Benchmark MEDIUM

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

Christopher G. Pedraza Pohlenz, Hassan Jalil Hadi, Ali Hassan +1 more

LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and...

5 days ago cs.CR cs.AI PDF

Benchmark MEDIUM

DataDignity: Training Data Attribution for Large Language Models

Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier

Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely...

5 days ago cs.AI PDF

Benchmark MEDIUM

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Dasol Choi, Eugenia Kim, Jaewon Noh +14 more

Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover,...

5 days ago cs.CL cs.AI PDF

Attack MEDIUM

Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning

Samuel Korn

Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively...

5 days ago cs.CR cs.CL cs.LG PDF

Defense MEDIUM

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Xinjie Shen, Rongzhe Wei, Peizhi Niu +6 more

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful...

5 days ago cs.CL cs.AI cs.CR PDF

Attack MEDIUM

Information Theoretic Adversarial Training of Large Language Models

Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi +3 more

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors...

6 days ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

On the Hardness of Junking LLMs

Marco Rando, Samuel Vaiter

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit...

6 days ago cs.LG PDF

Defense MEDIUM

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...

6 days ago cs.CR PDF

Benchmark MEDIUM

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Chenglin Yang

Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A...

6 days ago cs.AI cs.CR PDF

Attack MEDIUM

Gray-Box Poisoning of Continuous Malware Ingestion Pipelines

Jan Dolejš, Martin Jureček, Róbert Lórencz

Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work...

6 days ago cs.CR cs.LG PDF

Attack MEDIUM

Laundering AI Authority with Adversarial Examples

Jie Zhang, Pura Peetathawatchai, Florian Tramèr +1 more

Vision-language models (VLMs) are increasingly deployed as trusted authorities -- fact-checking images on social media, comparing products, and...

1 weeks ago cs.CR cs.LG PDF

Attack MEDIUM

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo +3 more

We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including...

1 weeks ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea, Juan Tapiador

Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based...

1 weeks ago cs.CR PDF

Attack MEDIUM

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea, Juan Tapiador

Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based...

1 weeks ago cs.CR PDF

Attack MEDIUM

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

Ishrith Gowda

Persistent external memory enables LLM agents to maintain context across sessions, yet its security properties remain formally uncharacterized. We...

1 weeks ago cs.CR cs.AI cs.LG PDF

Benchmark MEDIUM

Graph Reconstruction from Differentially Private GNN Explanations

Rishi Raj Sahoo, Jyotirmaya Shivottam, Subhankar Mishra

Regulatory frameworks such as GDPR increasingly require that ML predictions be accompanied by post-hoc explanations, even when raw data and trained...

1 weeks ago cs.LG cs.CR PDF

Benchmark MEDIUM

DECKER: Domain-invariant Embedding for Cross-Keyboard Extraction and Recognition

Bikrant Bikram Pratap Maurya, Nitin Choudhury, Daksh Agarwal +1 more

Acoustic side-channel attacks (ASCA) on keyboards pose a significant security risk, as keystrokes can be inferred from typing acoustics, revealing...

1 weeks ago cs.CR cs.SD PDF

Benchmark MEDIUM

Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios

Zuoyu Zhang, Yancheng Zhu

Tool-using agent systems powered by large language models (LLMs) are increasingly deployed across web, app, operating-system, and transactional...

1 weeks ago cs.AI PDF

Benchmark MEDIUM

MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory

Yuhui Wang, Tanqiu Jiang, Jiacheng Liang +2 more

As large language model (LLM)-powered agents are increasingly deployed to perform complex, real-world tasks, they face a growing class of attacks...

1 weeks ago cs.CR cs.AI cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial