What Matters For Safety Alignment?
Xing Li, Hui-Ling Zhen, Lihao Yin +3 more
This paper presents a comprehensive empirical study on the safety alignment capabilities. We evaluate what matters for safety alignment in LLMs and...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 261–272 of 272 papers
Clear filtersXing Li, Hui-Ling Zhen, Lihao Yin +3 more
This paper presents a comprehensive empirical study on the safety alignment capabilities. We evaluate what matters for safety alignment in LLMs and...
Di Wu, Yanyan Zhao, Xin Lu +2 more
Defending against jailbreak attacks is crucial for the safe deployment of Large Language Models (LLMs). Recent research has attempted to improve...
Maryam Abbasihafshejani, AHM Nazmus Sakib, Murtuza Jadliwala
The rapid advancement of speech synthesis technologies, including text-to-speech (TTS) and voice conversion (VC), has intensified security and...
Yun Bian, Yi Chen, HaiQuan Wang +2 more
Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security...
Rajiv Thummala, Katherine Winton, Luke Flores +2 more
Out-of-band screening of microcontrollers is a major gap in semiconductor supply chain security. High-assurance techniques such as X-ray and...
Hyunjun Kim
Guardrail models are essential for ensuring the safety of Large Language Model (LLM) deployments, but processing full multi-turn conversation...
Weijie Wang, Peizhuo Lv, Yan Wang +7 more
Graph Retrieval-Augmented Generation (GraphRAG) has emerged as a key technique for enhancing Large Language Models (LLMs) with proprietary Knowledge...
Yuchao Hou, Zixuan Zhang, Jie Wang +9 more
As a critical application of computational intelligence in remote sensing, deep learning-based synthetic aperture radar (SAR) image target...
Samaresh Kumar Singh, Joyjit Roy, Martin So
Recent attacks on critical infrastructure, including the 2021 Oldsmar water treatment breach and 2023 Danish energy sector compromises, highlight...
Alessio Benavoli, Alessandro Facchini, Marco Zaffalon
How can we ensure that AI systems are aligned with human values and remain safe? We can study this problem through the frameworks of the AI...
Toqeer Ali Syed, Mohammad Riyaz Belgaum, Salman Jan +2 more
The software supply chain attacks are becoming more and more focused on trusted development and delivery procedures, so the conventional post-build...
Xingwei Ma, Shiyang Feng, Bo Zhang +1 more
Remote sensing change detection (RSCD), a complex multi-image inference task, traditionally uses pixel-based operators or encoder-decoder networks...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial