Matching Ranks Over Probability Yields Truly Deep Safety Alignment
Jason Vega, Gagandeep Singh
A frustratingly easy technique known as the prefilling attack has been shown to effectively circumvent the safety alignment of frontier LLMs by...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 301–320 of 407 papers
Clear filtersJason Vega, Gagandeep Singh
A frustratingly easy technique known as the prefilling attack has been shown to effectively circumvent the safety alignment of frontier LLMs by...
Jiale Zhao, Xing Mou, Jinlin Wu +7 more
Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their...
Biagio Montaruli, Luca Compagna, Serena Elisa Ponta +1 more
The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical...
Xinzheng Wu, Junyi Chen, Naiting Zhong +1 more
The safe deployment of autonomous driving systems (ADSs) relies on comprehensive testing and evaluation. However, safety-critical scenarios that can...
Yixuan Tang, Yi Yang
Aligning Large Language Models (LLMs) with human preferences typically relies on external supervision, which faces critical limitations: human...
Weiwei Wang
Catastrophic forgetting remains a fundamental challenge in continual learning for large language models. Recent work revealed that performance...
Rongzhe Wei, Peizhi Niu, Xinjie Shen +7 more
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs. Existing approaches...
Cen Lu, Yung-Chen Tang, Andrea Cavallaro
Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet their robustness is poorly understood. In this...
Henry Onyeka, Emmanuel Samson, Liang Hong +3 more
The increasing complexity of IoT edge networks presents significant challenges for anomaly detection, particularly in identifying sophisticated...
Neemesh Yadav, Francesco Ortu, Jiarui Liu +5 more
Large Language Models (LLMs) are trained to refuse to respond to harmful content. However, systematic analyses of whether this behavior is truly a...
Fouad Trad, Ali Chehab
Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in...
Yaw Osei Adjei, Frederick Ayivor, Davis Opoku
Business Email Compromise (BEC) is a sophisticated social engineering threat that manipulates organizational hierarchies, leading to significant...
Axel Constant, Mahault Albarracin, Karl J. Friston
This paper presents a computational account of how legal norms can influence the behavior of artificial intelligence (AI) agents, grounded in the...
Junbo Zhang, Ran Chen, Qianli Zhou +2 more
Large language models demonstrate powerful capabilities across various natural language processing tasks, yet they also harbor safety...
Onat Gungor, Roshan Sood, Jiasheng Zhou +1 more
Large Language Models (LLMs) are highly effective for cybersecurity question answering (QA) but are difficult to deploy on edge devices due to their...
Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen +1 more
The rapid advancement of generators (e.g., StyleGAN, Midjourney, DALL-E) has produced highly realistic synthetic images, posing significant...
Swastik Bhattacharya, Sanjay Das, Anand Menon +3 more
Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these...
Samih Fadli
Large language model safety is usually assessed with static benchmarks, but key failures are dynamic: value drift under distribution shift, jailbreak...
Zhaoxin Zhang, Borui Chen, Yiming Hu +3 more
Recent research on large language model (LLM) jailbreaks has primarily focused on techniques that bypass safety mechanisms to elicit overtly harmful...
Zheyu Lin, Jirui Yang, Yukui Qiu +3 more
Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial