What Matters For Safety Alignment?
Xing Li, Hui-Ling Zhen, Lihao Yin +3 more
This paper presents a comprehensive empirical study on the safety alignment capabilities. We evaluate what matters for safety alignment in LLMs and...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 261–280 of 407 papers
Clear filtersXing Li, Hui-Ling Zhen, Lihao Yin +3 more
This paper presents a comprehensive empirical study on the safety alignment capabilities. We evaluate what matters for safety alignment in LLMs and...
Di Wu, Yanyan Zhao, Xin Lu +2 more
Defending against jailbreak attacks is crucial for the safe deployment of Large Language Models (LLMs). Recent research has attempted to improve...
Maryam Abbasihafshejani, AHM Nazmus Sakib, Murtuza Jadliwala
The rapid advancement of speech synthesis technologies, including text-to-speech (TTS) and voice conversion (VC), has intensified security and...
Yun Bian, Yi Chen, HaiQuan Wang +2 more
Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security...
Rajiv Thummala, Katherine Winton, Luke Flores +2 more
Out-of-band screening of microcontrollers is a major gap in semiconductor supply chain security. High-assurance techniques such as X-ray and...
Hyunjun Kim
Guardrail models are essential for ensuring the safety of Large Language Model (LLM) deployments, but processing full multi-turn conversation...
Weijie Wang, Peizhuo Lv, Yan Wang +7 more
Graph Retrieval-Augmented Generation (GraphRAG) has emerged as a key technique for enhancing Large Language Models (LLMs) with proprietary Knowledge...
Yuchao Hou, Zixuan Zhang, Jie Wang +9 more
As a critical application of computational intelligence in remote sensing, deep learning-based synthetic aperture radar (SAR) image target...
Samaresh Kumar Singh, Joyjit Roy, Martin So
Recent attacks on critical infrastructure, including the 2021 Oldsmar water treatment breach and 2023 Danish energy sector compromises, highlight...
Alessio Benavoli, Alessandro Facchini, Marco Zaffalon
How can we ensure that AI systems are aligned with human values and remain safe? We can study this problem through the frameworks of the AI...
Toqeer Ali Syed, Mohammad Riyaz Belgaum, Salman Jan +2 more
The software supply chain attacks are becoming more and more focused on trusted development and delivery procedures, so the conventional post-build...
Xingwei Ma, Shiyang Feng, Bo Zhang +1 more
Remote sensing change detection (RSCD), a complex multi-image inference task, traditionally uses pixel-based operators or encoder-decoder networks...
Eranga Bandara, Tharaka Hewa, Ross Gore +12 more
Agentic AI represents a major shift in how autonomous systems reason, plan, and execute multi-step tasks through the coordination of Large Language...
Long Zhang, Wei-neng Chen
The increasing integration of Large Language Models (LLMs) into decision-making frameworks has exposed significant vulnerabilities to social...
Le Wang, Zonghao Ying, Xiao Yang +7 more
Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable...
Anselm Paulus, Ilia Kulikov, Brandon Amos +4 more
Ensuring the safety of language models (LMs) while maintaining their usefulness remains a critical challenge in AI alignment. Current approaches rely...
Md Minhazul Islam Munna, Md Mahbubur Rahman, Jaroslav Frnda +2 more
The proliferation of IoT devices and their reliance on Wi-Fi networks have introduced significant security vulnerabilities, particularly the KRACK...
Kun Zhao, Siyuan Dai, Yingying Zhang +9 more
Early detection of Alzheimer's disease (AD) requires models capable of integrating macro-scale neuroanatomical alterations with micro-scale genetic...
Yang Ni, Tong Yang
Large Language Models (LLMs) and AI chatbots are increasingly used for emotional and mental health support due to their low cost, immediacy, and...
Haotian Deng, Chris Farber, Jiyoon Lee +1 more
Automated short-answer grading (ASAG) remains a challenging task due to the linguistic variability of student responses and the need for nuanced,...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial