AI Security Research

2,077+ academic papers on AI security, attacks, and defenses

Total

2,077

Attack

809

Benchmark

603

Defense

272

Tool

226

Survey

113

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1941–1960 of 2,054 papers

Clear filters

Attack MEDIUM

Bypassing Prompt Guards in Production with Controlled-Release Prompting

Jaiden Fairoze, Sanjam Garg, Keewoo Lee +1 more

As large language models (LLMs) advance, ensuring AI safety and alignment is paramount. One popular approach is prompt guards, lightweight mechanisms...

5 months ago cs.LG cs.CR PDF

Attack HIGH

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Isha Gupta, Rylan Schaeffer, Joshua Kazdan +2 more

The field of adversarial robustness has long established that adversarial examples can successfully transfer between image classifiers and that text...

5 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models

Luca Cotti, Idilio Drago, Anisa Rula +2 more

System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of...

5 months ago cs.AI PDF

Tool HIGH

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

Shoumik Saha, Jifan Chen, Sam Mayers +3 more

Code-capable large language model (LLM) agents are increasingly embedded into software engineering workflows where they can read, write, and execute...

5 months ago cs.CR cs.AI PDF

Benchmark HIGH

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Yinuo Liu, Ruohan Xu, Xilong Wang +2 more

Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general...

5 months ago cs.CR cs.AI cs.CL PDF

Defense LOW

Integrated Security Mechanisms for Weight Protection in Memristive Crossbar Arrays

Muhammad Faheemur Rahman, Wayne Burleson

Memristive crossbar arrays enable in-memory computing by performing parallel analog computations directly within memory, making them well-suited for...

5 months ago cs.CR cs.AR cs.ET PDF

Attack HIGH

Fine-Tuning Jailbreaks under Highly Constrained Black-Box Settings: A Three-Pronged Approach

Xiangfang Li, Yu Wang, Bo Li

With the rapid advancement of large language models (LLMs), ensuring their safe use becomes increasingly critical. Fine-tuning is a widely used...

5 months ago cs.CR PDF

Benchmark LOW

Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

Zhengliang Shi, Ruotian Ma, Jen-tse Huang +14 more

Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that...

5 months ago cs.CL cs.AI cs.CY PDF

Attack HIGH

Backdoor Attacks Against Speech Language Models

Alexandrine Fortier, Thomas Thebaud, Jesús Villalba +2 more

Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to...

5 months ago cs.CL cs.CR cs.SD PDF

Defense MEDIUM

Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense

Guobin Shen, Dongcheng Zhao, Haibo Tong +3 more

Ensuring Large Language Model (LLM) safety remains challenging due to the absence of universal standards and reliable content validators, making it...

5 months ago cs.AI PDF

Benchmark MEDIUM

Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning

Yicheng Lang, Yihua Zhang, Chongyu Fan +3 more

Large language model (LLM) unlearning aims to surgically remove the influence of undesired data or knowledge from an existing model while preserving...

5 months ago cs.LG PDF

Benchmark LOW

When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models

Chen-An Li, Tzu-Han Lin, Hung-yi Lee

Large audio-language models (LALMs) unify speech and text processing, but their robustness in noisy real-world settings remains underexplored. We...

5 months ago cs.SD cs.CL PDF

Attack MEDIUM

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned...

5 months ago cs.LG cs.CL cs.CR PDF

Defense HIGH

Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability

Shojiro Yamabe, Jun Sakuma

Diffusion language models (DLMs) generate tokens in parallel through iterative denoising, which can reduce latency and enable bidirectional...

5 months ago cs.AI cs.LG PDF

Benchmark MEDIUM

Sentry: Authenticating Machine Learning Artifacts on the Fly

Andrew Gan, Zahra Ghodsi

Machine learning systems increasingly rely on open-source artifacts such as datasets and models that are created or hosted by other parties. The...

5 months ago cs.CR PDF

Tool MEDIUM

PolyLink: A Blockchain Based Decentralized Edge AI Platform for LLM Inference

Hongbo Liu, Jiannong Cao, Bo Yang +7 more

The rapid advancement of large language models (LLMs) in recent years has revolutionized the AI landscape. However, the deployment model and usage of...

5 months ago cs.CR cs.DC PDF

Attack HIGH

Attack logics, not outputs: Towards efficient robustification of deep neural networks by falsifying concept-based properties

Raik Dankworth, Gesina Schwalbe

Deep neural networks (NNs) for computer vision are vulnerable to adversarial attacks, i.e., miniscule malicious changes to inputs may induce...

5 months ago cs.CR cs.LG PDF

Attack MEDIUM

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda +1 more

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive...

5 months ago cs.LG cs.CR PDF

Attack MEDIUM

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Yu Yan, Siqi Lu, Yang Gao +4 more

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware...

5 months ago cs.CR PDF

Attack HIGH

SVDefense: Effective Defense against Gradient Inversion Attacks via Singular Value Decomposition

Chenxiang Luo, David K. Y. Yau, Qun Song

Federated learning (FL) enables collaborative model training without sharing raw data but is vulnerable to gradient inversion attacks (GIAs), where...

5 months ago cs.CR cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial