Defense MEDIUM
Ce Zhang, Jinxi He, Junyi He +2 more
Multi-modal Large Language Models (MLLMs) have achieved remarkable performance across a wide range of visual reasoning tasks, yet their vulnerability...
1 months ago cs.CV cs.CL cs.CR
PDF
Defense MEDIUM
Zhenheng Tang, Xiang Liu, Qian Wang +3 more
As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first...
1 months ago cs.AI cs.CY
PDF
Defense MEDIUM
Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury +4 more
Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the...
1 months ago cs.LG cs.AI cs.CL
PDF
Defense MEDIUM
Yewon Han, Yumin Seol, EunGyung Kong +2 more
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...
1 months ago cs.CV cs.AI
PDF
Defense LOW
Max Hellrigel-Holderbaum, Edward James Young
As AI systems advance in capabilities, measuring their safety and alignment to human values is becoming paramount. A fast-growing field of AI...
1 months ago cs.CY cs.AI cs.CL
PDF
Defense MEDIUM
Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty
Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large...
Defense MEDIUM
Pengcheng Li, Jie Zhang, Tianwei Zhang +5 more
Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although...
1 months ago cs.CR cs.AI
PDF
Defense MEDIUM
Matthew Butler, Yi Fan, Christos Faloutsos
The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the...
2 months ago cs.CR cs.LG
PDF
Defense MEDIUM
Zonghao Ying, Xiao Yang, Siyang Wu +7 more
The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape....
Defense MEDIUM
Xinhao Deng, Yixiang Zhang, Jiaqing Wu +15 more
Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks....
2 months ago cs.CR cs.AI
PDF
Defense LOW
Lu Niu, Cheng Xue
Vision-language models offer strong few-shot capability through prompt tuning but remain vulnerable to noisy labels, which can corrupt prompts and...
Defense MEDIUM
Zhiyu Xue, Zimo Qi, Guangliang Liu +2 more
Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with refusal...
Defense LOW
Ali Eslami, Jiangbo Yu
This paper develops a control-theoretic framework for analyzing agentic systems embedded within feedback control loops, where an AI agent may adapt...
Defense MEDIUM
Harry Owiredu-Ashley
Most adversarial evaluations of large language model (LLM) safety assess single prompts and report binary pass/fail outcomes, which fails to capture...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense LOW
Yi Chen, Yun Bian, Haiquan Wang +2 more
The application of large language models to code generation has evolved from one-shot generation to iterative refinement, yet the evolution of...
2 months ago cs.CR cs.SE
PDF
Defense MEDIUM
Bo Jiang
Knowledge distillation from proprietary LLM APIs poses a growing threat to model providers, yet defenses against this attack remain fragmented and...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Sumit Ranjan, Sugandha Sharma, Ubaid Abbas +1 more
Voice interfaces are quickly becoming a common way for people to interact with AI systems. This also brings new security risks, such as prompt...
2 months ago cs.SD cs.AI
PDF
Defense MEDIUM
Xisen Jin, Michael Duan, Qin Lin +4 more
As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces...
2 months ago cs.CR cs.AI cs.CL
PDF
Defense MEDIUM
Jinman Wu, Yi Xie, Shen Lin +2 more
Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the...
2 months ago cs.CR cs.AI cs.LG
PDF
Defense MEDIUM
Ved Sriraman, Adam Block
Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a...
2 months ago cs.LG cs.AI
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial