Defense HIGH
Nikita Kezins, Urbas Ekka, Pascal Berrang +1 more
Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no...
Defense LOW
Giordano De Marzo, Alessandro Bellina, Claudio Castellano +2 more
Artificial intelligence safety research focuses on aligning individual language models with human values, yet deployed AI systems increasingly...
Yesterday physics.soc-ph cs.CL cs.MA
PDF
Defense MEDIUM
Krishak Aneja, Manas Mittal, Anmol Goel +2 more
Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent...
Yesterday cs.CL cs.AI
PDF
Defense LOW
Tianyuan Zhang, Peng Yue, Zihao Peng +8 more
Multimodal large language models (MLLMs) are increasingly integrated into autonomous driving (AD) systems; however, they remain vulnerable to diverse...
Defense HIGH
Wenxin Tang, Xiang Zhang, Junliang Liu +11 more
Automated vulnerability detection is a fundamental task in software security, yet existing learning-based methods still struggle to capture the...
Defense LOW
Aleksandr Bowkis, Marie Davidsen Buhl, Jacob Pfau +1 more
A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as...
Defense MEDIUM
Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6 more
Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing...
Defense MEDIUM
Guoxin Lu, Letian Sha, Qing Wang +4 more
The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on...
5 days ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Siyuan Li, Aodu Wulianghai, Xi Lin +6 more
The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from...
Defense MEDIUM
Xinjie Shen, Rongzhe Wei, Peizhi Niu +6 more
Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful...
5 days ago cs.CL cs.AI cs.CR
PDF
Defense LOW
Fabrice Harel-Canada, Amit Sahai
LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection...
6 days ago cs.CL cs.AI
PDF
Defense MEDIUM
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more
The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...
Defense LOW
Hanum Ko, Sangheum Yeon, Jong Hwan Ko +1 more
As DRAM scales in density and adopts 3D integration, raw fault rates increase and multi-bit errors are no longer rare. Such errors can severely...
Defense LOW
Zhenning Yang, Yuhan Chen, Patrick Tser Jern Kon +5 more
To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and...
1 weeks ago eess.SY cs.AI
PDF
Defense LOW
Srinath Perera, Kaviru Hapuarachchi, Frank Leymann +1 more
We present Robust Agent Compensation (RAC), a log-based recovery paradigm (providing a safety net) implemented through an architectural extension...
Defense MEDIUM
Prakhar Gupta, Garv Shah, Donghua Zhang
Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's...
1 weeks ago cs.LG cs.AI cs.CR
PDF
Defense LOW
Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey +1 more
Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural...
1 weeks ago eess.AS cs.CR cs.LG
PDF
Defense MEDIUM
Sadia Asif, Mohammad Mohammadi Amiri
Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable...
1 weeks ago cs.LG cs.AI cs.CE
PDF
Defense MEDIUM
Xiaokun Luan, Yihao Zhang, Pengcheng Su +2 more
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a...
Defense HIGH
Zeming Dong, Yuejun Guo, Qiang Hu +5 more
Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture...
2 weeks ago cs.SE cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial