Defense HIGH
Nikita Kezins, Urbas Ekka, Pascal Berrang +1 more
Guardrail Classifiers defend production language models against harmful behavior, but although results seem promising in testing, they provide no...
Benchmark HIGH
Chiyu Zhang, Huiqin Yang, Bendong Jiang +8 more
The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content...
Yesterday cs.CR cs.CL
PDF
Attack HIGH
Mengqi He, Xinyu Tian, Xin Shen +6 more
Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-model transferability,...
Yesterday cs.CV cs.AI
PDF
Tool HIGH
Tim Van hamme, Thomas Vissers, Javier Carnerero-Cano +4 more
LLMs are increasingly deployed as autonomous agents with access to tools, databases, and external services, yet practitioners (across different...
Yesterday cs.AI cs.CR
PDF
Attack HIGH
Zheng Lin, Zhenxing Niu, Haoxuan Ji +2 more
This paper proposes a jailbreaking prompt detection method for large language models (LLMs) to defend against jailbreak attacks. Although recent LLMs...
Yesterday cs.CR cs.AI
PDF
Attack HIGH
Desen Sun, Jason Hon, Howe Wang +3 more
With the rapid advancement of generative AI, users increasingly rely on image-generation models for image design and creation. To achieve faithful...
Attack HIGH
Zheng Lin, Zhenxing Niu, Haoxuan Ji +1 more
This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from...
Yesterday cs.CR cs.AI
PDF
Attack HIGH
Peiru Yang, Haoran Zheng, Tong Ju +6 more
Retrieval-augmented generation (RAG) is a widely adopted paradigm for enhancing LLMs in medical applications by incorporating expert multimodal...
Yesterday cs.CR cs.AI
PDF
Attack HIGH
Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd +3 more
Natural language interfaces to structured databases are becoming increasingly common, largely due to advances in large language models (LLMs) that...
Yesterday cs.CR cs.AI
PDF
Attack HIGH
Yue Li, Xiao Li, Hao Wu +5 more
Large Language Models (LLMs) are increasingly used for automated software development, making their ability to preserve secure coding practices...
Yesterday cs.CR cs.SE
PDF
Attack HIGH
Huilin Zhou, Jian Zhao, Yilu Zhong +7 more
Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). While automated methods have improved scalability, existing...
Yesterday cs.LG cs.AI
PDF
Survey HIGH
Monika Jotautaitė, Maria Angelica Martinez, Ollie Matthews +1 more
We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may...
2 days ago cs.CR cs.AI
PDF
Attack HIGH
Yiyong Liu, Chia-Yi Hsu, Chun-Ying Huang +3 more
LLM-powered coding agents increasingly make software supply chain decisions. They generate imports, recommend packages, and write installation...
Defense HIGH
Wenxin Tang, Xiang Zhang, Junliang Liu +11 more
Automated vulnerability detection is a fundamental task in software security, yet existing learning-based methods still struggle to capture the...
Benchmark HIGH
Shai Feldman, Yaniv Romano
Evaluating and predicting the performance of large language models (LLMs) in multi-turn conversational settings is critical yet computationally...
Benchmark HIGH
Mohammad Mamun, Mohamed Gaber, Scott Buffett +1 more
Language Model Agents (LMAs) are emerging as a powerful primitive for augmenting red-team operations. They can support attack planning, adversary...
Attack HIGH
Zeyuan Chen, Yihan Ma, Xinyue Shen +2 more
Large language models (LLMs) show strong performance across many applications, but their ability to memorize and potentially reveal training data...
Attack HIGH
Huiyu Xu, Zhibo Wang, Wenhui Zhang +4 more
Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to...
5 days ago cs.CR cs.AI
PDF
Attack HIGH
Md Farhamdur Reza, Richeng Jin, Tianfu Wu +1 more
Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to...
Attack HIGH
Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim +5 more
Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models,...
5 days ago cs.HC cs.AI cs.CY
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial