Proactive defense against LLM Jailbreak
Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi +2 more
The proliferation of powerful large language models (LLMs) has necessitated robust safety alignment, yet these models remain vulnerable to evolving...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 981–1000 of 1,050 papers
Clear filtersWeiliang Zhao, Jinjun Peng, Daniel Ben-Levi +2 more
The proliferation of powerful large language models (LLMs) has necessitated robust safety alignment, yet these models remain vulnerable to evolving...
Kuofeng Gao, Yiming Li, Chao Du +4 more
Jailbreaking attacks on the vision modality typically rely on imperceptible adversarial perturbations, whereas attacks on the textual modality are...
Yuxin Wen, Arman Zharmagambetov, Ivan Evtimov +4 more
Prompt injection poses a serious threat to the reliability and safety of LLM agents. Recent defenses against prompt injection, such as Instruction...
Santhosh KumarRavindran
The rapid adoption of large language models (LLMs) in enterprise systems exposes vulnerabilities to prompt injection attacks, strategic deception,...
Buyun Liang, Liangzu Peng, Jinqi Luo +3 more
Large Language Models (LLMs) are increasingly deployed in high-risk domains. However, state-of-the-art LLMs often exhibit hallucinations, raising...
Yu Cui, Sicheng Pan, Yifei Liu +2 more
Large language models (LLMs) have been widely deployed in Conversational AIs (CAIs), while exposing privacy and security threats. Recent research...
Yanjie Li, Yiming Cao, Dong Wang +1 more
Multimodal agents built on large vision-language models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to...
Xiangxiang Chen, Peixin Zhang, Jun Sun +2 more
Model quantization is a popular technique for deploying deep learning models on resource-constrained environments. However, it may also introduce...
Yulin Chen, Haoran Li, Yuan Sui +2 more
With the development of technology, large language models (LLMs) have dominated the downstream natural language processing (NLP) tasks. However,...
Rabeya Amin Jhuma, Mostafa Mohaimen Akand Faisal
This study explored how in-context learning (ICL) in large language models can be disrupted by data poisoning attacks in the setting of public health...
Maraz Mia, Mir Mehedi A. Pritom
Explainable Artificial Intelligence (XAI) has aided machine learning (ML) researchers with the power of scrutinizing the decisions of the black-box...
Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol +2 more
Large Language Models (LLMs) have revolutionized natural language processing but remain vulnerable to jailbreak attacks, especially multi-turn...
Sanket Badhe
We present LegalSim, a modular multi-agent simulation of adversarial legal proceedings that explores how AI systems can exploit procedural weaknesses...
Xinzhe Huang, Wenjing Hu, Tianhang Zheng +5 more
Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize adversarial suffixes to align the LLM output with...
Yu He, Yifei Chen, Yiming Li +5 more
In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG...
Zhixin Xie, Xurui Song, Jun Luo
Despite substantial efforts in safety alignment, recent research indicates that Large Language Models (LLMs) remain highly susceptible to jailbreak...
Chinthana Wimalasuriya, Spyros Tragoudas
Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect...
Zhaorun Chen, Xun Liu, Mintong Kang +4 more
As vision-language models (VLMs) gain prominence, their multimodal interfaces also introduce new safety vulnerabilities, making the safety evaluation...
Chengquan Guo, Chulin Xie, Yu Yang +6 more
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic...
Jonathan Sneh, Ruomei Yan, Jialin Yu +6 more
As LLMs increasingly power agents that interact with external tools, tool use has become an essential mechanism for extending their capabilities....
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial