Safety, Security, and Cognitive Risks in World Models
Manoj Parmar
World models -- learned internal simulators of environment dynamics -- are rapidly becoming foundational to autonomous decision-making in robotics,...
AI Threat Alert indexes 3,023+ peer-reviewed and preprint papers on AI/ML security — covering adversarial attacks, model defenses, red-teaming benchmarks, surveys, and security tooling. Papers are sourced from arXiv, classified by type and by relevance to real-world threats, and cross-referenced with the CVEs and incidents they relate to.
Showing 81–100 of 264 papers
Clear filtersManoj Parmar
World models -- learned internal simulators of environment dynamics -- are rapidly becoming foundational to autonomous decision-making in robotics,...
Saeid Jamshidi, Negar Shahabi, Foutse Khomh +2 more
Software-Defined Networking (SDN) is increasingly adopted to secure Internet-of-Things (IoT) networks due to its centralized control and programmable...
Osama Wehbi, Sarhad Arisdakessian, Omar Abdel Wahab +3 more
Backdoor attacks pose a significant threat to the integrity and reliability of Artificial Intelligence (AI) models, enabling adversaries to...
Xunguang Wang, Yuguang Zhou, Qingyue Wang +5 more
Large language models (LLMs) increasingly rely on explicit chain-of-thought (CoT) reasoning to solve complex tasks, yet the safety of the reasoning...
Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas +2 more
Activation steering has emerged as a powerful tool to shape LLM behavior without the need for weight updates. While its inherent brittleness and...
Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee
Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually...
Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg +1 more
Frontier LLM companies have repeatedly assured courts and regulators that their models do not store copies of training data. They further rely on...
Shawn Li, Yue Zhao
Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete...
Carlos Hinojosa, Clemens Grange, Bernard Ghanem
Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However,...
Ce Zhang, Jinxi He, Junyi He +2 more
Multi-modal Large Language Models (MLLMs) have achieved remarkable performance across a wide range of visual reasoning tasks, yet their vulnerability...
Zhenheng Tang, Xiang Liu, Qian Wang +3 more
As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first...
Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury +4 more
Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the...
Yewon Han, Yumin Seol, EunGyung Kong +2 more
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety...
Suvadeep Hajra, Palash Nandi, Tanmoy Chakraborty
Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large...
Pengcheng Li, Jie Zhang, Tianwei Zhang +5 more
Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although...
Matthew Butler, Yi Fan, Christos Faloutsos
The proposed method (FraudFox) provides solutions to adversarial attacks in a resource constrained environment. We focus on questions like the...
Zonghao Ying, Xiao Yang, Siyang Wu +7 more
The rapid evolution of Large Language Models (LLMs) into autonomous, tool-calling agents has fundamentally altered the cybersecurity landscape....
Xinhao Deng, Yixiang Zhang, Jiaqing Wu +15 more
Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks....
Zhiyu Xue, Zimo Qi, Guangliang Liu +2 more
Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with refusal...
Harry Owiredu-Ashley
Most adversarial evaluations of large language model (LLM) safety assess single prompts and report binary pass/fail outcomes, which fails to capture...
AI security research studies how AI and machine-learning systems can be attacked and defended — covering adversarial examples, prompt injection, model poisoning, training-data extraction, and the mitigations against them. AI Threat Alert curates this research from academic sources so security teams can track the threats behind emerging AI risks.
AI Threat Alert indexes 3,023+ papers on AI/ML security, classified across attack, defense, benchmark, survey, and tool categories and updated continuously.
Papers are sourced from arXiv, then classified by type and by relevance to real-world AI/ML threats, and cross-referenced with the CVEs and incidents they relate to.
Coverage spans adversarial attacks, model and system defenses, red-teaming benchmarks, literature surveys, and security tooling for LLMs, ML libraries, AI agents, and inference pipelines.
Every paper is filtered for AI security relevance and linked to the vulnerabilities, vendors, and incidents it relates to, so the research connects directly to operational threat intelligence.
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial