AJAR: Adaptive Jailbreak Architecture for Red-teaming
Yipu Dou, Wang Yang
Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...
AI Threat Alert indexes 3,082+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 1841–1860 of 3,082 papers
Yipu Dou, Wang Yang
Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...
Kaiyu Zhou, Yongsen Zheng, Yicheng He +5 more
The agent--tool interaction loop is a critical attack surface for modern Large Language Model (LLM) agents. Existing denial-of-service (DoS) attacks...
Haoze Guo, Ziqi Wei
Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web,...
Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad +1 more
Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning...
Jonah Ghebremichael, Saastha Vasan, Saad Ullah +6 more
Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results...
Xinrui Zhang, Pincan Zhao, Jason Jaskolka +2 more
Machine Learning (ML) has emerged as a pivotal technology in the operation of large and complex systems, driving advancements in fields such as...
Federico Pierucci, Marcello Galisai, Marcantonio Syrnikov Bracale +6 more
As LLM-based systems increasingly operate as agents embedded within human social and technical systems, alignment can no longer be treated as a...
Hao Wang, Yanting Wang, Hao Li +2 more
Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass...
Yinzhi Zhao, Ming Wang, Shi Feng +3 more
Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world...
Xingjun Ma, Yixu Wang, Hengyuan Xu +18 more
The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has driven major gains in reasoning, perception, and...
Christina Lu, Jack Gallagher, Jonathan Michala +2 more
Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...
Yi Liu, Weizhe Wang, Ruitao Feng +5 more
The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend...
Luoming Hu, Jingjie Zeng, Liang Yang +1 more
Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...
Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun
Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective...
Murat Bilgehan Ertan, Marten van Dijk
Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under...
Hao Li, Yankai Yang, G. Edward Suh +2 more
Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields....
Yutao Mou, Zhangchi Xue, Lijun Li +4 more
While LLM-based agents can interact with environments via invoking external tools, their expanded capabilities also amplify security risks....
Jiawen Zhang, Yangfan Hu, Kejia Chen +7 more
Fine-tuning is an essential and pervasive functionality for applying large language models (LLMs) to downstream tasks. However, it has the potential...
Mohoshin Ara Tahera, Karamveer Singh Sidhu, Shuvalaxmi Dass +1 more
Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs),...
Hanna Foerster, Tom Blanchard, Kristina Nikolić +6 more
AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss....
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,082+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial