Untargeted Jailbreak Attack
Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more
Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 2861–2880 of 3,023 papers
Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more
Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...
Yu He, Yifei Chen, Yiming Li +5 more
In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...
Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru +6 more
While finetuning AI agents on interaction data -- such as web browsing or tool use -- improves their capabilities, it also introduces critical...
Nikoo Naghavian, Mostafa Tavassolipour
Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work,...
Zhixin Xie, Xurui Song, Jun Luo
Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...
Abrar Shahid, Ibteeker Mahir Ishum, AKM Tahmidul Haque +2 more
This paper presents a controlled study of adversarial reinforcement learning in network security through a custom OpenAI Gym environment that models...
Lesly Miculicich, Mihir Parmar, Hamid Palangi +4 more
The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These...
Chinthana Wimalasuriya, Spyros Tragoudas
Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect...
Bowei Ning, Xuejun Zong, Kan He
Industrial control systems (ICS) are vital to modern infrastructure but increasingly vulnerable to cybersecurity threats, particularly through...
Zhaorun Chen, Xun Liu, Mintong Kang +4 more
As vision-language models (VLMs) gain prominence, their multimodal interfaces also introduce new safety vulnerabilities, making the safety evaluation...
Chengquan Guo, Chulin Xie, Yu Yang +6 more
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic...
Chenpei Huang, Lingfeng Yao, Hui Zhong +5 more
Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing...
Jonathan Sneh, Ruomei Yan, Jialin Yu +6 more
As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities....
Ruohao Guo, Afshin Oroojlooy, Roshan Sridhar +3 more
Despite recent rapid progress in AI safety, current large language models remain vulnerable to adversarial attacks in multi-turn interaction...
Yuhao Sun, Zhuoer Xu, Shiwen Cui +4 more
Large Language Models (LLMs) have achieved remarkable progress across a wide range of tasks, but remain vulnerable to safety risks such as harmful...
Kedong Xiu, Churui Zeng, Tianhang Zheng +6 more
Existing gradient-based jailbreak attacks typically optimize an adversarial suffix to induce a fixed affirmative response, e.g., ``Sure, here...
Paschal C. Amusuo, Dongge Liu, Ricardo Andres Calvo Mendez +3 more
Fuzz testing has become a cornerstone technique for identifying software bugs and security vulnerabilities, with broad adoption in both industry and...
Davide Gabrielli, Simone Sestito, Iacopo Masi
The current landscape of defensive mechanisms for LLMs is fragmented and underdeveloped, unlike prior work on classifiers. To further promote...
Zhaoyan Wang, Zheng Gao, Arogya Kharel +1 more
Graph Neural Networks (GNNs) are widely adopted in Web-related applications, serving as a core technique for learning from graph-structured data,...
Clara Maathuis, Kasper Cools
In a time of rapidly evolving military threats and increasingly complex operational environments, the integration of AI into military operations...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial