Density-aware Sample-specific Attack
Qiyuan Wang, Yao Li, Raymond K. W. Wong
Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 121–140 of 1,175 papers
Clear filtersQiyuan Wang, Yao Li, Raymond K. W. Wong
Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through...
Arthur Renard, Franck Gabriel, Valentin Hartmann +1 more
We present Frost Training, a method for improving Monte Carlo-based policy optimization for a large family of LLM-as-a-judge tasks called...
Abile Jean, Kuniyilh S
Cyber-Physical Systems (CPS) integrate sensing, communication, computation, and control to support critical infrastructure, including smart grids,...
Khang Tran, Yazan Boshmaf, Issa Khalil +3 more
Code Large Language Models (CLLMs) serve as the core of modern code agents, enabling developers to automate complex software development tasks. In...
Snehasis Mukhopadhyay
Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have more severe consequences than unsafe...
Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee
Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work,...
Xuan Luo, Yue Wang, Geng Tu +2 more
In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal...
Yunbo Long, Haolang Zhao, Lukas Beckenbauer +2 more
Post-trained LLMs are often optimized to align responses with human preferences, making them safe, polite, and conversationally appropriate. In...
Zhe Yu, Wenpeng Xing, Gaolei Li +4 more
Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where...
Zedian Shao, Charles Fleming, Teodora Baluta
Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely...
Kevin Kuo, Chhavi Yadav, Virginia Smith
Recent defenses for safeguarding open-weight large language models (LLMs) are intended to prevent adversarial usage. Underlying these defenses is an...
Arian Komaei Koma, Seyed Amir Kasaei, AmirMahdi Sadeghzadeh +1 more
Machine unlearning aims to remove specific concepts from pretrained text-to-image diffusion models, yet several white- and black-box attacks have...
Mohammed N. Swileh, Shengli Zhang, Kai Lei
Software-Defined Networking (SDN) provides flexible and programmable network management; however, its centralized control architecture remains highly...
Xiao Liu, Jiaxiang Liu, Boci Peng +6 more
Vision Language Models adapt well to downstream tasks but are highly vulnerable to adversarial perturbations that disrupt cross-modal semantic...
Jianwei Tai
Vision-Language-Action (VLA) models are increasingly deployed on real robots, where each predicted action is executed and each failure carries a...
Jianwei Tai
Vision-Language-Action (VLA) models are increasingly deployed on real robots, where each predicted action is executed and each failure carries a...
Yue Liu, Yanjie Zhao, Yunbo Lyu +3 more
Agentic AI coding assistants can edit files, run commands, and access the internet on behalf of developers. However, their reliance on unvetted...
Aditya Sridhar
Concept Bottleneck Models (CBMs) have emerged as a cornerstone approach for interpretable machine learning, providing human-understandable...
Dongpeng Zhang, Ke Ma, Yangbangyan Jiang +4 more
Adversarial images pose a severe security threat to multimodal large language models through prompt injection. Existing defenses largely lack a...
Esra Yeniaras
Quantum machine learning (QML) is moving from research prototypes to deployed cloud services. As QML enters regulated industries, the integrity of...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial