AI Security Research
2,529+ academic papers on AI security, attacks, and defenses
Defense HIGH
Nikita Kezins, Urbas Ekka, Pascal Berrang +1 more
Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no...
Defense LOW
Giordano De Marzo, Alessandro Bellina, Claudio Castellano +2 more
Artificial intelligence safety research focuses on aligning individual language models with human values, yet deployed AI systems increasingly...
Yesterday physics.soc-ph cs.CL cs.MA
PDF
Defense MEDIUM
Krishak Aneja, Manas Mittal, Anmol Goel +2 more
Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent...
Yesterday cs.CL cs.AI
PDF
Defense LOW
Tianyuan Zhang, Peng Yue, Zihao Peng +8 more
Multimodal large language models (MLLMs) are increasingly integrated into autonomous driving (AD) systems; however, they remain vulnerable to diverse...
Defense HIGH
Wenxin Tang, Xiang Zhang, Junliang Liu +11 more
Automated vulnerability detection is a fundamental task in software security, yet existing learning-based methods still struggle to capture the...
Defense LOW
Aleksandr Bowkis, Marie Davidsen Buhl, Jacob Pfau +1 more
A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as...
Defense MEDIUM
Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6 more
Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing...
Defense MEDIUM
Guoxin Lu, Letian Sha, Qing Wang +4 more
The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on...
5 days ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Siyuan Li, Aodu Wulianghai, Xi Lin +6 more
The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from...
Defense MEDIUM
Xinjie Shen, Rongzhe Wei, Peizhi Niu +6 more
Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful...
5 days ago cs.CL cs.AI cs.CR
PDF
Defense LOW
Fabrice Harel-Canada, Amit Sahai
LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection...
6 days ago cs.CL cs.AI
PDF
Defense MEDIUM
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more
The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...
Defense LOW
Hanum Ko, Sangheum Yeon, Jong Hwan Ko +1 more
As DRAM scales in density and adopts 3D integration, raw fault rates increase and multi-bit errors are no longer rare. Such errors can severely...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial