Defense MEDIUM
Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury +4 more
Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the...
1 weeks ago cs.LG cs.AI cs.CL
PDF
Survey MEDIUM
Kai Wang, Biaojie Zeng, Zeming Wei +7 more
With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel...
1 weeks ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Yu Pan, Wenlong Yu, Tiejun Wu +4 more
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, they remain highly susceptible to...
1 weeks ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Ye Wang, Jing Liu, Toshiaki Koike-Akino
The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain...
1 weeks ago cs.LG cs.AI cs.CL
PDF
Benchmark MEDIUM
Yuhuan Liu, Haitian Zhong, Xinyuan Xia +3 more
Large Language Models (LLMs) often suffer from catastrophic forgetting and collapse during sequential knowledge editing. This vulnerability stems...
Benchmark MEDIUM
Jinhu Qi, Yifan Li, Minghao Zhao +4 more
As agentic AI systems move beyond static question answering into open-ended, tool-augmented, and multi-step real-world workflows, their increased...
1 weeks ago cs.CL cs.DB
PDF
Tool MEDIUM
Zhuoshang Wang, Yubing Ren, Yanan Cao +3 more
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring...
1 weeks ago cs.CR cs.CL
PDF
Defense MEDIUM
Yewon Han, Yumin Seol, EunGyung Kong +2 more
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...
1 weeks ago cs.CV cs.AI
PDF
Attack MEDIUM
Ruyi Zhang, Heng Gao, Songlei Jian +2 more
Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a...
1 weeks ago cs.CR cs.AI
PDF
Defense MEDIUM
Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty
Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large...
Defense MEDIUM
Pengcheng Li, Jie Zhang, Tianwei Zhang +5 more
Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although...
1 weeks ago cs.CR cs.AI
PDF
Tool MEDIUM
Ziling Zhou
AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term...
Tool MEDIUM
Ziling Zhou
AI agents dynamically acquire tools, orchestrate sub-agents, and transact across organizational boundaries, yet no existing security layer verifies...
Benchmark MEDIUM
Ivan Lopez, Selin S. Everett, Bryan J. Bunning +10 more
Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during...
1 weeks ago cs.HC cs.LG
PDF
Benchmark MEDIUM
Arjun Chakraborty, Sandra Ho, Adam Cook +1 more
CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat...
Attack MEDIUM
Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy
Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains...
Attack MEDIUM
Jianwei Li, Jung-Eun Kim
Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces...
1 weeks ago cs.CR cs.AI cs.LG
PDF
Defense MEDIUM
Matthew Butler, Yi Fan, Christos Faloutsos
The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the...
1 weeks ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Zhifang Zhang, Bojun Yang, Shuo He +5 more
Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where...
1 weeks ago cs.CV cs.CR
PDF
Defense MEDIUM
Zonghao Ying, Xiao Yang, Siyang Wu +7 more
The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape....
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial