AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total
2,560
Attack
982
Benchmark
736
Defense
350
Tool
275
Survey
144

Showing 341–360 of 2,560 papers

Benchmark MEDIUM

ADAG: Automatically Describing Attribution Graphs

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt +1 more

In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular...

1 months ago cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial