AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 321–340 of 932 papers

Clear filters

Defense MEDIUM

VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription

Sumit Ranjan, Sugandha Sharma, Ubaid Abbas +1 more

Voice interfaces are quickly becoming a common way for people to interact with AI systems. This also brings new security risks, such as prompt...

2 months ago cs.SD cs.AI PDF

Benchmark MEDIUM

Models as Lego Builders: Assembling Malice from Benign Blocks via Semantic Blueprints

Chenxi Li, Xianggan Liu, Dake Shen +9 more

Despite the rapid progress of Large Vision-Language Models (LVLMs), the integration of visual modalities introduces new safety vulnerabilities that...

2 months ago cs.CV cs.LG PDF

Tool MEDIUM

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

Yuhang Huang, Boyang Ma, Biwei Yan +5 more

The Model Context Protocol (MCP) is an open and standardized interface that enables large language models (LLMs) to interact with external tools and...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

Where Do LLM-based Systems Break? A System-Level Security Framework for Risk Assessment and Treatment

Neha Nagaraja, Hayretdin Bahsi

Large Language Models (LLMs) are increasingly integrated into safety-critical workflows, yet existing security analyses remain fragmented and often...

2 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

Yige Li, Wei Zhao, Zhe Li +6 more

Backdoor mechanisms have traditionally been studied as security threats that compromise the integrity of machine learning models. However, the same...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

Detecting Cryptographically Relevant Software Packages with Collaborative LLMs

Eduard Hirsch, Kristina Raab, Tobias J. Bauer +1 more

IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities....

2 months ago cs.CR cs.IR PDF

Benchmark MEDIUM

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Yuxu Ge

Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and...

2 months ago cs.CR cs.AI PDF

Tool MEDIUM

Can Safety Emerge from Weak Supervision? A Systematic Analysis of Small Language Models

Punyajoy Saha, Sudipta Halder, Debjyoti Mondal +1 more

Safety alignment is critical for deploying large language models (LLMs) in real-world applications, yet most existing approaches rely on large...

2 months ago cs.CL cs.AI cs.LG PDF

Survey MEDIUM

ESAA-Security: An Event-Sourced, Verifiable Architecture for Agent-Assisted Security Audits of AI-Generated Code

Elzo Brito dos Santos Filho

AI-assisted software generation has increased development speed, but it has also amplified a persistent engineering problem: systems that are...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Donghwa Kang, Hojun Choe, Doohyun Kim +2 more

Deploying deep neural networks (DNNs) on edge devices exposes valuable intellectual property to model-stealing attacks. While TEE-shielded DNN...

2 months ago cs.CR PDF

Defense MEDIUM

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It

Xisen Jin, Michael Duan, Qin Lin +4 more

As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces...

2 months ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin +2 more

Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the...

2 months ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment

Ved Sriraman, Adam Block

Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a...

2 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation

Xiaoguang Li, Hanyi Wang, Yaowei Huang +6 more

Shuffler-based differential privacy (shuffle-DP) is a privacy paradigm providing high utility by involving a shuffler to permute noisy report from...

2 months ago cs.CR PDF

Attack MEDIUM

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV...

2 months ago cs.CR cs.LG PDF

Defense MEDIUM

ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts

Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul +1 more

The safety evaluation of large language models (LLMs) remains largely centered on English, leaving non-English languages and culturally grounded...

2 months ago cs.CL PDF

Benchmark MEDIUM

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu +6 more

Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources....

2 months ago cs.CR cs.LG PDF

Survey MEDIUM

Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models

G. Madan Mohan, Veena Kiran Nambiar, Kiranmayee Janardhan

We introduce the Dynamic Behavioral Constraint (DBC) benchmark, the first empirical framework for evaluating the efficacy of a structured,...

2 months ago cs.AI PDF

Attack MEDIUM

Efficient Refusal Ablation in LLM through Optimal Transport

Geraldin Nanfack, Eugene Belilovsky, Elvis Dohmatob

Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent...

2 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Kelly L Vomo-Donfack, Adryel Hoszu, Grégory Ginot +1 more

Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions...

2 months ago cs.LG cs.CR cs.DC PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial