Defense MEDIUM
Osama Wehbi, Sarhad Arisdakessian, Omar Abdel Wahab +3 more
Backdoor attacks pose a significant threat to the integrity and reliability of Artificial Intelligence (AI) models, enabling adversaries to...
1 months ago cs.LG cs.CR cs.DC
PDF
Defense MEDIUM
Xunguang Wang, Yuguang Zhou, Qingyue Wang +5 more
Large language models (LLMs) increasingly rely on explicit chain-of-thought (CoT) reasoning to solve complex tasks, yet the safety of the reasoning...
1 months ago cs.AI cs.CR
PDF
Defense MEDIUM
Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas +2 more
Activation steering has emerged as a powerful tool to shape LLM behavior without the need for weight updates. While its inherent brittleness and...
1 months ago cs.CR cs.CL
PDF
Defense MEDIUM
Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee
Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually...
1 months ago cs.CR cs.AI cs.MM
PDF
Defense MEDIUM
Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg +1 more
Frontier LLM companies have repeatedly assured courts and regulators that their models do not store copies of training data. They further rely on...
1 months ago cs.CL cs.AI cs.CY
PDF
Defense MEDIUM
Shawn Li, Yue Zhao
Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete...
1 months ago cs.CR cs.AI cs.LG
PDF
Defense MEDIUM
Carlos Hinojosa, Clemens Grange, Bernard Ghanem
Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However,...
1 months ago cs.CV cs.AI cs.CL
PDF
Defense MEDIUM
Ce Zhang, Jinxi He, Junyi He +2 more
Multi-modal Large Language Models (MLLMs) have achieved remarkable performance across a wide range of visual reasoning tasks, yet their vulnerability...
1 months ago cs.CV cs.CL cs.CR
PDF
Defense MEDIUM
Zhenheng Tang, Xiang Liu, Qian Wang +3 more
As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first...
1 months ago cs.AI cs.CY
PDF
Defense MEDIUM
Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury +4 more
Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the...
1 months ago cs.LG cs.AI cs.CL
PDF
Defense MEDIUM
Yewon Han, Yumin Seol, EunGyung Kong +2 more
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...
1 months ago cs.CV cs.AI
PDF
Defense MEDIUM
Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty
Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large...
Defense MEDIUM
Pengcheng Li, Jie Zhang, Tianwei Zhang +5 more
Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although...
1 months ago cs.CR cs.AI
PDF
Defense MEDIUM
Matthew Butler, Yi Fan, Christos Faloutsos
The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the...
2 months ago cs.CR cs.LG
PDF
Defense MEDIUM
Zonghao Ying, Xiao Yang, Siyang Wu +7 more
The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape....
Defense MEDIUM
Xinhao Deng, Yixiang Zhang, Jiaqing Wu +15 more
Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks....
2 months ago cs.CR cs.AI
PDF
Defense MEDIUM
Zhiyu Xue, Zimo Qi, Guangliang Liu +2 more
Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with refusal...
Defense MEDIUM
Harry Owiredu-Ashley
Most adversarial evaluations of large language model (LLM) safety assess single prompts and report binary pass/fail outcomes, which fails to capture...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Bo Jiang
Knowledge distillation from proprietary LLM APIs poses a growing threat to model providers, yet defenses against this attack remain fragmented and...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Sumit Ranjan, Sugandha Sharma, Ubaid Abbas +1 more
Voice interfaces are quickly becoming a common way for people to interact with AI systems. This also brings new security risks, such as prompt...
2 months ago cs.SD cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial