AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 41–60 of 66 papers

Clear filters

Defense MEDIUM

ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel

Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6 more

Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing...

5 days ago cs.CR PDF

Defense MEDIUM

Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks

Guoxin Lu, Letian Sha, Qing Wang +4 more

The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on...

5 days ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

Qinfeng Li, Yuntai Bao, Jianghui Hu +5 more

LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property....

5 days ago cs.CR cs.AI PDF

Survey MEDIUM

Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

Bonan Ruan, Yeqi Fu, Chuqi Zhang +3 more

GitHub Continuous Integration (CI) workflows increasingly integrate Large Language Models (LLMs) to automate review, triage, content generation, and...

5 days ago cs.CR cs.SE PDF

Defense MEDIUM

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

Siyuan Li, Aodu Wulianghai, Xi Lin +6 more

The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from...

5 days ago cs.CL PDF

Attack HIGH

LoopTrap: Termination Poisoning Attacks on LLM Agents

Huiyu Xu, Zhibo Wang, Wenhui Zhang +4 more

Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to...

5 days ago cs.CR cs.AI PDF

Benchmark MEDIUM

LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

Christopher G. Pedraza Pohlenz, Hassan Jalil Hadi, Ali Hassan +1 more

LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and...

5 days ago cs.CR cs.AI PDF

Attack HIGH

Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

Md Farhamdur Reza, Richeng Jin, Tianfu Wu +1 more

Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to...

5 days ago cs.AI PDF

Benchmark MEDIUM

DataDignity: Training Data Attribution for Large Language Models

Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier

Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely...

5 days ago cs.AI PDF

Attack HIGH

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim +5 more

Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models,...

5 days ago cs.HC cs.AI cs.CY PDF

Benchmark MEDIUM

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Dasol Choi, Eugenia Kim, Jaewon Noh +14 more

Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover,...

5 days ago cs.CL cs.AI PDF

Attack MEDIUM

Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning

Samuel Korn

Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively...

5 days ago cs.CR cs.CL cs.LG PDF

Defense MEDIUM

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Xinjie Shen, Rongzhe Wei, Peizhi Niu +6 more

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful...

5 days ago cs.CL cs.AI cs.CR PDF

Benchmark LOW

The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation

Hoin Jung, Xiaoqian Wang

While Multimodal Large Language Models (MLLMs) are increasingly integrated with Retrieval-Augmented Generation (RAG) to mitigate hallucinations, the...

5 days ago cs.CL cs.CV cs.LG PDF

Defense LOW

SLAM: Structural Linguistic Activation Marking for Language Models

Fabrice Harel-Canada, Amit Sahai

LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection...

6 days ago cs.CL cs.AI PDF

Attack MEDIUM

Information Theoretic Adversarial Training of Large Language Models

Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi +3 more

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors...

6 days ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

On the Hardness of Junking LLMs

Marco Rando, Samuel Vaiter

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit...

6 days ago cs.LG PDF

Survey HIGH

SoK: Robustness in Large Language Models against Jailbreak Attacks

Feiyue Xu, Hongsheng Hu, Chaoxiang He +9 more

Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce...

6 days ago cs.CR cs.AI PDF

Defense MEDIUM

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...

6 days ago cs.CR PDF

Tool HIGH

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Zhaorun Chen, Xun Liu, Haibo Tong +14 more

AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due...

6 days ago cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial