Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security
Tu Lan, Chaowei Xiao
Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 101–120 of 146 papers
Clear filtersTu Lan, Chaowei Xiao
Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A...
Kazuki Iwahana, Masaru Matsubayashi, Takuma Koyama +3 more
Backdoor attacks pose a serious threat to the safety and reliability of Large Language Models (LLMs), as they cause models to behave normally on...
Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik +1 more
Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly...
Naihao Deng, Yilun Zhu, Naichen Shi +2 more
Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale...
Bulat Nutfullin, Vladimir Evgrafov, Dmitry Namiot
Multimodal large language models (MLLMs) now appear in safety-critical applications, but the visual channel leaves them open to adversarial attacks...
Lena S. Bolliger, Lena A. Jäger
Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural...
Yv Zhang, Hao Sun, Hao Fang +5 more
External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However,...
Lijia Yu, Jiuxin Cao, Yuchen Qiang +3 more
Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key...
Peiyang Li, Songping Wang, Yi Huang +9 more
Autonomous AI agents have driven the transition from conversation to task execution, shifting security failures from textual deception to system...
Nicole Mitchell, Galen Andrew, Arun Ganesh +2 more
Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical...
Saeid Jamshidi
The rapid proliferation of Internet of Things (IoT) devices has enabled unprecedented automation and connectivity, but it has also substantially...
Saeid Jamshidi, Amin Nikanjam, Arghavan Moradi Dakhel +2 more
Large Language Models (LLMs) in multi-turn interactions maintain evolving context rather than generating isolated responses, making them vulnerable...
Carlos S. Sepúlveda, Gonzalo A. Ruz
Neural combinatorial optimization (NCO) trains autoregressive policies to solve routing problems. The standard training algorithm, REINFORCE with a...
Qijun Wang, Chunqi Qian, Huacheng Zeng
In today's digitally connected world, keyboards remain the primary interface for inputting sensitive information, making them a persistent target for...
Vasisht Duddu, Lipeng He, Asim Waheed +1 more
Machine learning (ML) models are susceptible to various security, privacy, and fairness risks. Adversaries with different characteristics (i.e.,...
Gilad Gressel, Rahul Pankajakshan, Julia Diament +3 more
As LLMs are deployed as agents, reliable monitoring requires knowing not only what they output, but which instructions are steering their behavior....
Shixiong Jiang, Taozheng Zhu, Fanxin Kong
Offline safe reinforcement learning (Safe RL) enables policy learning without online interactions, making it suitable for safety-critical systems...
Yinan Wang
AI Scientist agents are often evaluated as if capability were mainly a function of model quality, prompting, or reasoning scaffolds. We test a...
Yuhan Ma, Stefan Schmid
Tool-using large language model (LLM) agents face two distinct security failures: unauthorized external actions and exposure of sensitive plaintext...
Bastien Vuillod, Kevin Hector, Pierre-Alain Moellic +2 more
Federated Learning (FL) allows a set of clients to collectively train a global model without sharing local training data. Giving the responsibility...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial