Defense MEDIUM
Krishak Aneja, Manas Mittal, Anmol Goel +2 more
Fine-tuning Large Language Models (LLMs) on benign narrow data can sometimes induce broad harmful behaviors, a vulnerability termed emergent...
Yesterday cs.CL cs.AI
PDF
Defense MEDIUM
Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6 more
Autonomous LLM agents face a critical security risk known as workflow hijacking, where attackers subtly alter tool and skill invocations. Existing...
Defense MEDIUM
Guoxin Lu, Letian Sha, Qing Wang +4 more
The safety alignment of Large Language Models (LLMs) remains vulnerable to Harmful Fine-tuning (HFT). While existing defenses impose constraints on...
5 days ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Siyuan Li, Aodu Wulianghai, Xi Lin +6 more
The increasing prevalence of Large Language Models (LLMs) in content creation has made distinguishing human-written textual content from...
Defense MEDIUM
Xinjie Shen, Rongzhe Wei, Peizhi Niu +6 more
Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful...
5 days ago cs.CL cs.AI cs.CR
PDF
Defense MEDIUM
Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera +2 more
The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank...
Defense MEDIUM
Prakhar Gupta, Garv Shah, Donghua Zhang
Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's...
1 weeks ago cs.LG cs.AI cs.CR
PDF
Defense MEDIUM
Sadia Asif, Mohammad Mohammadi Amiri
Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable...
1 weeks ago cs.LG cs.AI cs.CE
PDF
Defense MEDIUM
Xiaokun Luan, Yihao Zhang, Pengcheng Su +2 more
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a...
Defense MEDIUM
Ravikumar Balakrishnan, Sanket Mendapara
Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power...
Defense MEDIUM
Nay Myat Min, Long H. Pham, Jun Sun
Large language models deployed at runtime can misbehave in ways that clean-data validation cannot anticipate: training-time backdoors lie dormant...
2 weeks ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Kaisheng Fan, Weizhe Zhang, Yishu Gao +2 more
Defending against backdoor attacks in large language models remains a critical practical challenge. Existing defenses mitigate these threats but...
2 weeks ago cs.CR cs.AI
PDF
Defense MEDIUM
Krishiv Agarwal, Ramneet Kaur, Colin Samplawski +6 more
Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities...
2 weeks ago cs.CR cs.LG
PDF
Defense MEDIUM
Chao Pan, Yu Wu, Xin Yao
Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion...
2 weeks ago cs.CR cs.AI cs.LG
PDF
Defense MEDIUM
Divyesh Gabbireddy, Suman Saha
Cross-site scripting (XSS) remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious...
3 weeks ago cs.CR cs.LG cs.SE
PDF
Defense MEDIUM
Sarang Nambiar, Dhruv Pradhan, Ezekiel Soremekun
Pre-trained machine learning models (PTMs) are commonly provided via Model Hubs (e.g., Hugging Face) in standard formats like Pickles to facilitate...
3 weeks ago cs.CR cs.SE
PDF
Defense MEDIUM
Kun Wang, Cheng Qian, Miao Yu +6 more
Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is...
3 weeks ago cs.CR cs.AI
PDF
Defense MEDIUM
Hugo Lyons Keenan, Christopher Leckie, Sarah Erfani
We can often verify the correctness of neural network outputs using ground truth labels, but we cannot reliably determine whether the output was...
3 weeks ago cs.LG cs.CR
PDF
Defense MEDIUM
Ziyang Liu
Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such...
3 weeks ago cs.CR cs.AI
PDF
Defense MEDIUM
Dongcheng Zhang, Yiqing Jiang
Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a...
3 weeks ago cs.CR cs.AI cs.CL
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial