Latent-space Attacks for Refusal Evasion in Language Models
Giorgio Piras, Raffaele Mura, Fabio Brau +4 more
Safety-aligned language models are trained to refuse harmful requests, yet refusal behavior can be suppressed by steering their internal...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 81–100 of 748 papers
Clear filtersGiorgio Piras, Raffaele Mura, Fabio Brau +4 more
Safety-aligned language models are trained to refuse harmful requests, yet refusal behavior can be suppressed by steering their internal...
Pengyu Sun, Qishu Jin, Enhao Huang +4 more
Model Context Protocol (MCP) has emerged as a standard interface for connecting LLM agents to external tools. Because MCP servers expose privileged...
Abdullah Al Nomaan Nafi, Fnu Suya, Swarup Bhunia +1 more
Jailbreak attacks expose a persistent gap between the intended safety behavior of aligned large language models and their behavior under adversarial...
Leitao Yuan, Qinghua Mao, Daizong Liu +5 more
Multimodal large language models (MLLMs) remain vulnerable to transfer-based targeted attacks, where perturbations optimized on open-source surrogate...
Jiachen Ma, Jiawen Zhang, Xiangtian Li +3 more
While Large Language Models (LLMs) demonstrate remarkable capabilities, they remain susceptible to sophisticated, multi-step jailbreak attacks that...
Yifei Wang, Tianlin Li, Xiaohan Zhang +3 more
Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs....
Yasmine Hayder
Knowledge Graphs (KGs) are a powerful representation of linked data, offering flexibility, semantic richness, and support for knowledge enrichment...
Zheng Lin, Zhenxing Niu, Haoxuan Ji +2 more
Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex problems by generating structured, step-by-step reasoning...
Xiaoyan Ma, Seohyun Lee, Taejoon Kim +1 more
Over-the-air federated learning (OTA-FL) improves communication efficiency by exploiting the superposition property of wireless channels, but this...
Becky Mashaido, Tapadhir Das
Prompt injection attacks pose significant risks to language model safety, yet existing defenses are typically evaluated using classification...
John T. Halloran, Noopur S. Bhatt
Large language models (LLMs) are highly susceptible to backdoor attacks (BAs), wherein training samples are poisoned using trigger-based harmful...
Doohee You
The expansion of Multimodal Large Language Models (MLLMs) and their integration into autonomous agentic workflows has introduced a non-stationary...
Mohamed elShehaby, Ashraf Matrawy
Gradient-based adversarial attacks subtly manipulate inputs of Machine Learning (ML) models to induce incorrect predictions. This paper investigates...
Juozas Dautartas, Olga Kurasova, Juozapas Rokas Čypas +1 more
Machine learning-based malware detectors are widely deployed in antivirus and endpoint detection systems, yet their reliance on static features makes...
Ali Iranmanesh, Peng Liu
Open-vocabulary embodied AI agents increasingly rely on vision-language models such as CLIP for object perception and task grounding. However, the...
Dylan Marx, Marcel Dunaiski
Large Language Models (LLMs) remain vulnerable to jailbreak attempts that circumvent safety guardrails. We investigate whether multi-turn...
Yanyun Wang, Yu Huang, Zi Liang +2 more
The integration of audio modality into Large Audio Language Models (LALMs) significantly expands their attack surface. Existing jailbreak paradigms...
Hongjang Yang, Hyunsik Na, Daeseon Choi
LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external tools such as web browsing. These...
Ziwei Wang, Jing Chen, Ruichao Liang +6 more
Despite rigorous safety alignment, Large Language Models (LLMs) remain vulnerable to jailbreak attacks. Existing black-box methods often rely on...
Datta Manikanta Sri Hari Danduri, Aravind Kumar Machiry
AI Accelerator (AIA) are specialized hardware e.g., Tensor Processing Unit (TPU), that enable optimal and efficient execution of AI applications and...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial