GenAI-Driven Threat Detection with Microsoft Security Copilot
Scott Freitas, Amir Gharib
Defending against today's increasingly sophisticated cyberattacks requires security analysts to continuously translate evolving attacker tradecraft...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 201–220 of 950 papers
Clear filtersScott Freitas, Amir Gharib
Defending against today's increasingly sophisticated cyberattacks requires security analysts to continuously translate evolving attacker tradecraft...
Amit Roth, Ankur Samanta, Matan Halevy +2 more
Aligning autonomous agents with human intent remains a central challenge in modern AI. A key manifestation of this challenge is reward hacking,...
Richard J. Young, Gregory D. Moody
The evaluation of large language model refusal on malicious-coding tasks now spans at least thirteen publicly released prompt corpora (AdvBench, the...
Chengcai Gao, Zhihong Sun, Xiaochuan Shi +2 more
The growing adoption of Retrieval-Augmented Generation (RAG) has led to a rise in adversarial attacks. Existing defenses, relying on semantic...
Mohammed Alshaalan, Miguel R. D. Rodrigues
Optimization-based adversarial suffixes can jailbreak aligned large language models (LLMs) while remaining fluent, weakening static and windowed...
Florian A. D. Burnat, Brittany I. Davidson
Multi-tenant retrieval-augmented generation (RAG) services advertise per-account differential privacy as the operative leakage boundary: each...
Isaac David, Arthur Gervais
Do stock safety-aligned language models and their uncensored or abliterated derivatives behave differently when run as autonomous security agents?...
Paul Wang, Jade Garcia-Bourrée, Anne-Marie Kermarrec +1 more
As jailbreaks, adversarially crafted inputs that bypass safety constraints, continue to be discovered in Large Language Models, practitioners...
Hongyu Cai, Arjun Arunasalam, Yiming Liang +2 more
Large Language Model (LLM) alignment remains vulnerable to jailbreak attacks that elicit unsafe responses, motivating pre-model and post-model...
Daniel Yiming Cao, Chengzhong Wang, Sheng-Yen Chou +3 more
Masked diffusion language models (MDLMs) are emerging as a compelling new paradigm for text generation, but their training-time security remains...
Tobias Braun, Jonas Henry Grebe, Hossein Shakibania +2 more
Unified autoregressive models (UAMs) are transformer models that generate text as well as image tokens within a single autoregressive pass. Shared...
Mihai Christodorescu, Earlence Fernandes, Ashish Hooda +11 more
We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted...
Yubin Qu, Ying Zhang, Yanjun Zhang +4 more
Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does more than...
Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou +1 more
Memory-augmented large language model (LLM) agents use iterative reflection and self-evolution to solve complex tasks, but these mechanisms introduce...
Jonathan Diller, David Barnes, Rebekah Bogdanoff +14 more
As autonomous systems grow more advanced, objective metrics to evaluate their ethical and legal compliance are critical for informing end users of...
Rohith Uppala
Large language models increasingly operate as autonomous agents that select and invoke tools from large registries. We identify a critical gap: when...
Jiahe Guo, Xiangran Guo, Jiaxuan Chen +6 more
Multimodal large language models (MLLMs) often fail to transfer safety capabilities learned in the text modality to semantically equivalent non-text...
Lei Zhao, Abhay Bhaskar, Edgar Dobriban
AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI)...
Md Navid Bin Islam, Sajal Saha, Senior Member
Machine-learning-based Intrusion Detection Systems (IDS) have achieved impressive accuracy in classifying network attacks, yet they consistently fall...
Zhi Quan Zhou, Dave Towey, Tsong Yueh Chen
Large language models (LLMs) are increasingly used to generate requirements specifications, design documents, code, and test cases. In contrast, much...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial