AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 141–160 of 345 papers

Clear filters

Defense MEDIUM

Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs

Yukun Jiang, Hai Huang, Mingjie Li +3 more

By introducing routers to selectively activate experts in Transformer layers, the mixture-of-experts (MoE) architecture significantly reduces...

3 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation

Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi +1 more

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and generation. However, these...

3 months ago cs.LG cs.CR PDF

Defense LOW

When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified

Gautam Siddharth Kashyap, Mark Dras, Usman Naseem

Large Language Models (LLMs) need to be in accordance with human values-being helpful, harmless, and honest (HHH)-is important for safe deployment....

3 months ago cs.CL PDF

Defense MEDIUM

Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook

Yunbei Zhang, Kai Mei, Ming Liu +5 more

We present the first large-scale empirical study of Moltbook, an AI-only social platform where 27,269 agents produced 137,485 posts and 345,580...

3 months ago cs.SI cs.AI PDF

Defense MEDIUM

Plato's Form: Toward Backdoor Defense-as-a-Service for LLMs with Prototype Representations

Chen Chen, Yuchen Sun, Jiaxin Gao +4 more

Large language models (LLMs) are increasingly deployed in security-sensitive applications, yet remain vulnerable to backdoor attacks. However,...

3 months ago cs.CR PDF

Defense MEDIUM

Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection

Hema Karnam Surendrababu, Nithin Nagaraj

Machine Learning (ML) models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and...

3 months ago cs.CR PDF

Defense LOW

One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models

Daniel Fein, Max Lamparth, Violet Xiang +2 more

Reward Models (RMs) are crucial for online alignment of language models (LMs) with human preferences. However, RM-based preference-tuning is...

3 months ago cs.CL cs.AI PDF

Defense MEDIUM

ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

Rohan Subramanian Thomas, Shikhar Shiromani, Abdullah Chaudhry +4 more

Prompt design significantly impacts the moral competence and safety alignment of large language models (LLMs), yet empirical comparisons remain...

3 months ago cs.AI cs.CL PDF

Defense MEDIUM

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu, Zhi Yang, Zhiheng Jin +19 more

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security...

3 months ago cs.CR cs.AI PDF

Defense MEDIUM

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

Jiacheng Liang, Yuhui Wang, Tanqiu Jiang +1 more

Mixture-of-Experts (MoE) language models introduce unique challenges for safety alignment due to their sparse routing mechanisms, which can enable...

3 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen +1 more

Large language models (LLMs) for Verilog code generation are increasingly adopted in hardware design, yet remain vulnerable to backdoor attacks where...

3 months ago cs.SE cs.CR PDF

Defense MEDIUM

Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space

Sidahmed Benabderrahmane, Petko Valtchev, James Cheney +1 more

Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental...

3 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

Semantic Containment as a Fundamental Property of Emergent Misalignment

Rohan Saxena

Fine-tuning language models on narrowly harmful data causes emergent misalignment (EM) -- behavioral failures extending far beyond training...

3 months ago cs.CL cs.AI PDF

Defense MEDIUM

RACA: Representation-Aware Coverage Criteria for LLM Safety Testing

Zeming Wei, Zhixin Zhang, Chengcan Wu +3 more

Recent advancements in LLMs have led to significant breakthroughs in various AI applications. However, their sophisticated capabilities also...

3 months ago cs.SE cs.AI cs.CL PDF

Defense MEDIUM

TinyGuard:A lightweight Byzantine Defense for Resource-Constrained Federated Learning via Statistical Update Fingerprints

Ali Mahdavi, Santa Aghapour, Azadeh Zamanifar +1 more

Existing Byzantine robust aggregation mechanisms typically rely on fulldimensional gradi ent comparisons or pairwise distance computations, resulting...

3 months ago cs.CR cs.AI PDF

Defense MEDIUM

Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models

Siqi Wen, Shu Yang, Shaopeng Fu +3 more

Vision Language Action (VLA) models close the perception action loop by translating multimodal instructions into executable behaviors, but this very...

3 months ago cs.RO PDF

Defense MEDIUM

Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models

Siqi Wen, Shu Yang, Shaopeng Fu +3 more

Vision Language Action (VLA) models close the perception action loop by translating multimodal instructions into executable behaviors, but this very...

3 months ago cs.RO PDF

Defense LOW

Engineering AI Agents for Clinical Workflows: A Case Study in Architecture,MLOps, and Governance

Cláudio Lúcio do Val Lopes, João Marcus Pitta, Fabiano Belém +2 more

The integration of Artificial Intelligence (AI) into clinical settings presents a software engineering challenge, demanding a shift from isolated...

3 months ago cs.AI cs.SE PDF

Defense LOW

From Detection to Prevention: Explaining Security-Critical Code to Avoid Vulnerabilities

Ranjith Krishnamurthy, Oshando Johnson, Goran Piskachev +1 more

Security vulnerabilities often arise unintentionally during development due to a lack of security expertise and code complexity. Traditional tools,...

3 months ago cs.CR cs.AI cs.SE PDF

Defense MEDIUM

A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode

Zeyuan He, Yupeng Chen, Lang Lin +7 more

Diffusion large language models (D-LLMs) offer an alternative to autoregressive LLMs (AR-LLMs) and have demonstrated advantages in generation...

3 months ago cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial