Tool MEDIUM
Zhuoshang Wang, Yubing Ren, Yanan Cao +3 more
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring...
1 months ago cs.CR cs.CL
PDF
Defense MEDIUM
Yewon Han, Yumin Seol, EunGyung Kong +2 more
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...
1 months ago cs.CV cs.AI
PDF
Attack MEDIUM
Ruyi Zhang, Heng Gao, Songlei Jian +2 more
Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a...
1 months ago cs.CR cs.AI
PDF
Defense MEDIUM
Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty
Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large...
Defense MEDIUM
Pengcheng Li, Jie Zhang, Tianwei Zhang +5 more
Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although...
1 months ago cs.CR cs.AI
PDF
Tool MEDIUM
Ziling Zhou
AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term...
Tool MEDIUM
Ziling Zhou
AI agents dynamically acquire tools, orchestrate sub-agents, and transact across organizational boundaries, yet no existing security layer verifies...
Benchmark MEDIUM
Ivan Lopez, Selin S. Everett, Bryan J. Bunning +10 more
Large language models (LLMs) are entering clinician workflows, yet evaluations rarely measure how clinician reasoning shapes model behavior during...
1 months ago cs.HC cs.LG
PDF
Benchmark MEDIUM
Arjun Chakraborty, Sandra Ho, Adam Cook +1 more
CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat...
Attack MEDIUM
Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy
Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains...
Attack MEDIUM
Jianwei Li, Jung-Eun Kim
Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces...
2 months ago cs.CR cs.AI cs.LG
PDF
Defense MEDIUM
Matthew Butler, Yi Fan, Christos Faloutsos
The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the...
2 months ago cs.CR cs.LG
PDF
Benchmark MEDIUM
Zhifang Zhang, Bojun Yang, Shuo He +5 more
Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where...
2 months ago cs.CV cs.CR
PDF
Defense MEDIUM
Zonghao Ying, Xiao Yang, Siyang Wu +7 more
The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape....
Tool MEDIUM
Jiangrong Wu, Zitong Yao, Yuhong Nan +1 more
Tool-augmented LLM agents increasingly rely on multi-step, multi-tool workflows to complete real tasks. This design expands the attack surface,...
2 months ago cs.SE cs.CR
PDF
Attack MEDIUM
Xiangkui Cao, Jie Zhang, Meina Kan +2 more
Large Vision-Language Models (LVLMs) have shown remarkable potential across a wide array of vision-language tasks, leading to their adoption in...
Benchmark MEDIUM
Ninghui Li, Kaiyuan Zhang, Kyle Polley +1 more
This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and...
2 months ago cs.LG cs.AI cs.CR
PDF
Attack MEDIUM
Haodong Zhao, Jinming Hu, Yijie Bai +6 more
Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every...
Benchmark MEDIUM
Junjie Chu, Yiting Qu, Ye Leng +4 more
Large Language Models (LLMs) are increasingly trained to align with human values, primarily focusing on task level, i.e., refusing to execute...
2 months ago cs.CR cs.AI
PDF
Tool MEDIUM
Frank Li
Tool-augmented LLM agents introduce security risks that extend beyond user-input filtering, including indirect prompt injection through fetched...
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial