AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 81–100 of 222 papers

Clear filters

Defense MEDIUM

Capability-Oriented Training Induced Alignment Risk

Yujun Zhou, Yue Huang, Han Bao +8 more

While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging:...

2 months ago cs.LG cs.CL PDF

Defense MEDIUM

LoRA-based Parameter-Efficient LLMs for Continuous Learning in Edge-based Malware Detection

Christian Rondanini, Barbara Carminati, Elena Ferrari +2 more

The proliferation of edge devices has created an urgent need for security solutions capable of detecting malware in real time while operating under...

2 months ago cs.CR cs.AI cs.DC PDF

Defense MEDIUM

Future Mining: Learning for Safety and Security

Md Sazedur Rahman, Mizanur Rahman Jewel, Sanjay Madria

Mining is rapidly evolving into an AI driven cyber physical ecosystem where safety and operational reliability depend on robust perception,...

3 months ago cs.CR cs.DC PDF

Defense MEDIUM

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Adel ElZemity, Joshua Sylvester, Budi Arief +1 more

SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes...

3 months ago cs.CR PDF

Defense MEDIUM

Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks

Enrico Ahlers, Daniel Passon, Yannic Noller +1 more

Machine learning models are increasingly present in our everyday lives; as a result, they become targets of adversarial attackers seeking to...

3 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

TRACE: Timely Retrieval and Alignment for Cybersecurity Knowledge Graph Construction and Expansion

Zijing Xu, Ziwei Ning, Tiancheng Hu +4 more

The rapid evolution of cyber threats has highlighted significant gaps in security knowledge integration. Cybersecurity Knowledge Graphs (CKGs)...

3 months ago cs.CR PDF

Defense MEDIUM

SecCodePRM: A Process Reward Model for Code Security

Weichen Yu, Ravi Mangal, Yinyi Luo +4 more

Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging....

3 months ago cs.CR cs.SE PDF

Defense MEDIUM

Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Kun Wang, Zherui Li, Zhenhong Zhou +8 more

Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a...

3 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley, Benjamin M. Marlin +1 more

Alignment audits aim to robustly identify hidden goals from strategic, situationally aware misaligned models. Despite this threat model, existing...

3 months ago cs.LG PDF

Defense MEDIUM

Is Reasoning Capability Enough for Safety in Long-Context Language Models?

Yu Fu, Haz Sameen Shahgir, Huanli Gong +3 more

Large language models (LLMs) increasingly combine long-context processing with advanced reasoning, enabling them to retrieve and synthesize...

3 months ago cs.CL cs.CR PDF

Defense MEDIUM

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

Yukun Jiang, Hai Huang, Mingjie Li +3 more

By introducing routers to selectively activate experts in Transformer layers, the mixture-of-experts (MoE) architecture significantly reduces...

3 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation

Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi +1 more

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and generation. However, these...

3 months ago cs.LG cs.CR PDF

Defense MEDIUM

Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook

Yunbei Zhang, Kai Mei, Ming Liu +5 more

We present the first large-scale empirical study of Moltbook, an AI-only social platform where 27,269 agents produced 137,485 posts and 345,580...

3 months ago cs.SI cs.AI PDF

Defense MEDIUM

Plato's Form: Toward Backdoor Defense-as-a-Service for LLMs with Prototype Representations

Chen Chen, Yuchen Sun, Jiaxin Gao +4 more

Large language models (LLMs) are increasingly deployed in security-sensitive applications, yet remain vulnerable to backdoor attacks. However,...

3 months ago cs.CR PDF

Defense MEDIUM

Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection

Hema Karnam Surendrababu, Nithin Nagaraj

Machine Learning (ML) models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and...

3 months ago cs.CR PDF

Defense MEDIUM

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

Rohan Subramanian Thomas, Shikhar Shiromani, Abdullah Chaudhry +4 more

Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain...

3 months ago cs.AI cs.CL PDF

Defense MEDIUM

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu, Zhi Yang, Zhiheng Jin +19 more

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security...

3 months ago cs.CR cs.AI PDF

Defense MEDIUM

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

Jiacheng Liang, Yuhui Wang, Tanqiu Jiang +1 more

Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable...

3 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen +1 more

Large language models (LLMs) for Verilog code generation are increasingly adopted in hardware design, yet remain vulnerable to backdoor attacks where...

3 months ago cs.SE cs.CR PDF

Defense MEDIUM

Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space

Sidahmed Benabderrahmane, Petko Valtchev, James Cheney +1 more

Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental...

3 months ago cs.LG cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial