Attack HIGH
Muxi Diao, Yutao Mou, Keqing He +6 more
The safety of Large Language Models (LLMs) is crucial for the development of trustworthy AI applications. Existing red teaming methods often rely on...
Attack HIGH
Stanisław Pawlak, Jan Dubiński, Daniel Marczak +1 more
Model merging (MM) recently emerged as an effective method for combining large deep learning models. However, it poses significant security risks....
7 months ago cs.LG cs.AI cs.CR
PDF
Benchmark HIGH
Haoran Ou, Kangjie Chen, Xingshuo Han +4 more
Large Language Models (LLMs) have been augmented with web search to overcome the limitations of the static knowledge boundary by accessing up-to-date...
7 months ago cs.CR cs.AI
PDF
Attack HIGH
Kazuki Egashira, Robin Staab, Thibaud Gloaguen +2 more
Model pruning, i.e., removing a subset of model weights, has become a prominent approach to reducing the memory footprint of large language models...
7 months ago cs.LG cs.AI cs.CR
PDF
Attack HIGH
Weisen Jiang, Sinno Jialin Pan
This paper introduces MetaDefense, a novel framework for defending against finetuning-based jailbreak attacks in large language models (LLMs). We...
7 months ago cs.LG cs.AI cs.CL
PDF
Attack HIGH
Renhua Ding, Xiao Yang, Zhengwei Fang +3 more
Large vision-language models (LVLMs) enable autonomous mobile agents to operate smartphone user interfaces, yet vulnerabilities in their perception...
7 months ago cs.CR cs.AI
PDF
Attack HIGH
Christos Ziakas, Nicholas Loo, Nishita Jain +1 more
Automated red-teaming has emerged as a scalable approach for auditing Large Language Models (LLMs) prior to deployment, yet existing approaches lack...
Attack HIGH
Artur Horal, Daniel Pina, Henrique Paz +7 more
This paper presents the vision, scientific contributions, and technical details of RedTWIZ: an adaptive and diverse multi-turn red teaming framework,...
7 months ago cs.CR cs.CL
PDF
Attack HIGH
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
Jailbreaking large language models (LLMs) has emerged as a pressing concern with the increasing prevalence and accessibility of conversational LLMs....
Attack HIGH
Giorgio Giannone, Guangxuan Xu, Nikhil Shivakumar Nayak +4 more
Inference-Time Scaling (ITS) improves language models by allocating more computation at generation time. Particle Filtering (PF) has emerged as a...
7 months ago cs.LG cs.AI cs.CL
PDF
Attack HIGH
Nouar Aldahoul, Yasir Zaki
The rapid spread of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has...
7 months ago cs.CL cs.AI cs.CR
PDF
Attack HIGH
Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė +3 more
Jailbreaks are adversarial attacks designed to bypass the built-in safety mechanisms of large language models. Automated jailbreaks typically...
7 months ago cs.CL cs.AI cs.LG
PDF
Attack HIGH
Meng Tong, Yuntao Du, Kejiang Chen +2 more
Membership inference attacks (MIAs) are widely used to assess the privacy risks associated with machine learning models. However, when these attacks...
7 months ago cs.CR cs.AI
PDF
Other HIGH
Xin-Cheng Wen, Zirui Lin, Yijun Yang +2 more
The exponential increase in software vulnerabilities has created an urgent need for automatic vulnerability repair (AVR) solutions. Recent research...
7 months ago cs.AI cs.SE
PDF
Attack HIGH
Xiaogeng Liu, Chaowei Xiao
Recent advancements in jailbreaking large language models (LLMs), such as AutoDAN-Turbo, have demonstrated the power of automated strategy discovery....
7 months ago cs.CR cs.AI
PDF
Benchmark HIGH
Rishika Bhagwatkar, Kevin Kasa, Abhay Puri +5 more
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause...
Benchmark HIGH
Rishika Bhagwatkar, Kevin Kasa, Abhay Puri +5 more
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause...
Attack HIGH
Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi +2 more
The proliferation of powerful large language models (LLMs) has necessitated robust safety alignment, yet these models remain vulnerable to evolving...
7 months ago cs.CR cs.CL
PDF
Attack HIGH
Kuofeng Gao, Yiming Li, Chao Du +4 more
Jailbreaking attacks on the vision modality typically rely on imperceptible adversarial perturbations, whereas attacks on the textual modality are...
7 months ago cs.CL cs.AI cs.CR
PDF
Attack HIGH
Yuxin Wen, Arman Zharmagambetov, Ivan Evtimov +4 more
Prompt injection poses a serious threat to the reliability and safety of LLM agents. Recent defenses against prompt injection, such as Instruction...
7 months ago cs.CR cs.LG
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial