AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1–6 of 6 papers

Clear filters

Defense MEDIUM

Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs

Krishak Aneja, Manas Mittal, Anmol Goel +2 more

Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent...

Yesterday cs.CL cs.AI PDF

Defense MEDIUM

ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel

Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6 more

Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing...

5 days ago cs.CR PDF

Defense MEDIUM

Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks

Guoxin Lu, Letian Sha, Qing Wang +4 more

The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on...

5 days ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

Siyuan Li, Aodu Wulianghai, Xi Lin +6 more

The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from...

5 days ago cs.CL PDF

Defense MEDIUM

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Xinjie Shen, Rongzhe Wei, Peizhi Niu +6 more

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful...

5 days ago cs.CL cs.AI cs.CR PDF

Defense MEDIUM

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more

The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...

6 days ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial