Defense MEDIUM
Xisen Jin, Michael Duan, Qin Lin +4 more
As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Jinman Wu, Yi Xie, Shen Lin +2 more
Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the...
2 months ago cs.CR cs.AI cs.LG
PDF
Defense MEDIUM
Ved Sriraman, Adam Block
Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a...
2 months ago cs.LG cs.AI
PDF
Defense MEDIUM
Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul +1 more
The safety evaluation of large language models (LLMs) remains largely centered on English, leaving non-English languages and culturally grounded...
Defense MEDIUM
Zeyu Zhang, Xiangxiang Dai, Ziyi Han +2 more
Large language models (LLMs) are typically governed by post-training alignment (e.g., RLHF or DPO), which yields a largely static policy during...
2 months ago cs.LG cs.AI
PDF
Defense MEDIUM
Manisha Mukherjee, Vincent J. Hellendoorn
Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development, yet their limited transparency in...
2 months ago cs.SE cs.AI cs.CR
PDF
Defense MEDIUM
Ming Wen, Kun Yang, Xin Chen +4 more
Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as...
2 months ago cs.LG cs.AI
PDF
Defense MEDIUM
Chang Xue, Fang Liu, Jiaye Wang +2 more
Decentralized financial platforms rely heavily on Web of Trust reputation systems to mitigate counterparty risk in the absence of centralized...
2 months ago cs.CR cs.AI cs.LG
PDF
Defense MEDIUM
Lan Zhang, Chengsi Liang, Zeming Zhuang +4 more
Semantic communication (SemCom) redefines wireless communication from reproducing symbols to transmitting task-relevant semantics. However, this...
2 months ago cs.CR eess.SY
PDF
Defense MEDIUM
Xuan Chen, Hao Liu, Tao Yuan +3 more
Traditional phishing website detection relies on static heuristics or reference lists, which lag behind rapidly evolving attacks. While recent...
Defense MEDIUM
Mengxuan Hu, Vivek V. Datla, Anoop Kumar +4 more
Recent advances in alignment techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct...
2 months ago cs.CL cs.AI
PDF
Defense MEDIUM
Morteza Eskandarian, Mahdi Rabbani, Arun Kaniyamattam +6 more
The current generation of large language models produces sophisticated social-engineering content that bypasses standard text screening systems in...
Defense MEDIUM
Chun Yan Ryan Kan, Tommy Tran, Vedant Yadav +4 more
Defending LLMs against adversarial jailbreak attacks remains an open challenge. Existing defenses rely on binary classifiers that fail when...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Zachary Coalson, Beth Sohler, Aiden Gabriel +1 more
We identify a structural weakness in current large language model (LLM) alignment: modern refusal mechanisms are fail-open. While existing approaches...
2 months ago cs.LG cs.CR
PDF
Defense MEDIUM
Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami +1 more
Safety alignment is essential for the responsible deployment of large language models (LLMs). Yet, existing approaches often rely on heavyweight...
2 months ago cs.CR cs.LG
PDF
Defense MEDIUM
Ahmed Ryan, Ibrahim Khalil, Abdullah Al Jahid +4 more
The prevalence of malicious packages in open-source repositories, such as PyPI, poses a critical threat to the software supply chain. While Large...
2 months ago cs.CR cs.SE
PDF
Defense MEDIUM
David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit +4 more
LoRA adapters let users fine-tune large language models (LLMs) efficiently. However, LoRA adapters are shared through open repositories like Hugging...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Tianyu Chen, Dongrui Liu, Xia Hu +2 more
Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises...
2 months ago cs.CR cs.AI
PDF
Defense MEDIUM
George Alexandru Adam, Alexander Cui, Edwin Thomas +7 more
While historical considerations surrounding text authenticity revolved primarily around plagiarism, the advent of large language models (LLMs) has...
Defense MEDIUM
Zhaoxin Wang, Jiaming Liang, Fengbin Zhu +5 more
Large language models (LLMs) and multimodal LLMs are typically safety-aligned before release to prevent harmful content generation. However, recent...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial