AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 41–47 of 47 papers

Clear filters

Defense MEDIUM

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +20 more

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and...

3 weeks ago cs.LG PDF

Defense MEDIUM

Can Agents Secure Hardware? Evaluating Agentic LLM-Driven Obfuscation for IP Protection

Sujan Ghimire, Parsa Mirfasihi, Muhtasim Alam Chowdhury +6 more

The globalization of integrated circuit (IC) design and manufacturing has increased the exposure of hardware intellectual property (IP) to untrusted...

4 weeks ago cs.CR PDF

Defense MEDIUM

Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval

Willy Carlos Tchuitcheu, Tan Lu, Ann Dooms

Historical approaches to Table Representation Learning (TRL) have largely adopted the sequential paradigms of Natural Language Processing (NLP). We...

4 weeks ago cs.AI PDF

Defense LOW

A longitudinal health agent framework

Georgianna, Lin, Rencong Jiang +2 more

Although artificial intelligence (AI) agents are increasingly proposed to support potentially longitudinal health tasks, such as symptom management,...

4 weeks ago cs.AI cs.HC PDF

Defense MEDIUM

Detecting Safety Violations Across Many Agent Traces

Adam Stein, Davis Brown, Hamed Hassani +2 more

To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare,...

4 weeks ago cs.AI cs.CL PDF

Defense MEDIUM

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Junxiao Yang, Haoran Liu, Jinzhe Tu +9 more

Large language models (LLMs) often demonstrate strong safety performance in high-resource languages, yet exhibit severe vulnerabilities when queried...

4 weeks ago cs.LG cs.AI cs.CL PDF

Defense LOW

SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering

Ningyan Zhu, Huacan Wang, Jie Zhou +8 more

The rise of OpenClaw in early 2026 marks the moment when millions of users began deploying personal AI agents into their daily lives, delegating...

4 weeks ago cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial