AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 81–100 of 919 papers

Clear filters

Benchmark MEDIUM

GoAT-X: A Graph of Auditing Thoughts for Securing Token Transactions in Cross-Chain Contracts

Zijun Feng, Yuming Feng, Yu Wang +4 more

Cross-chain bridges, the critical infrastructure of the multi-chain ecosystem, have become a primary target for attackers, resulting in over $2.8...

2 weeks ago cs.CR PDF

Attack MEDIUM

Mitigating Error Amplification in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Bo Wang +3 more

Fast Adversarial Training (FAT) has proven effective in enhancing model robustness by encouraging networks to learn perturbation-invariant...

2 weeks ago cs.LG cs.CR PDF

Benchmark MEDIUM

Dynamic Cyber Ranges

Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone +6 more

As LLM-driven agents advance in cybersecurity, Jeopardy CTF benchmarks are approaching saturation and cyber ranges, the natural next evaluation...

2 weeks ago cs.CR PDF

Defense MEDIUM

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Kaisheng Fan, Weizhe Zhang, Yishu Gao +2 more

Defending against backdoor attacks in large language models remains a critical practical challenge. Existing defenses mitigate these threats but...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

System-aware contextual digital twin for ICS anomaly diagnosis

Eungyu Woo, Yooshin Kim, Wonje Heo +1 more

Industrial Control Systems (ICS) integrate computing, physical processes, and communication to operate critical infrastructures such as power grids,...

2 weeks ago cs.CR PDF

Survey MEDIUM

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

Jiaqi Li, Yang Zhao, Bin Sun +3 more

Autonomous AI agents deployed on platforms such as OpenClaw face prompt injection, memory poisoning, supply-chain attacks, and social engineering,...

2 weeks ago cs.CR cs.AI PDF

Tool MEDIUM

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule

This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular...

2 weeks ago cs.CR PDF

Benchmark MEDIUM

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

Qi Li, Bo Yin, Weiqi Huang +6 more

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety...

2 weeks ago cs.RO PDF

Survey MEDIUM

When AI reviews science: Can we trust the referee?

Jialiang Wang, Yuchen Liu, Hang Xu +7 more

The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At...

2 weeks ago cs.AI PDF

Benchmark MEDIUM

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen +3 more

Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to...

2 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

Vishal Rajput

We prove that empirical risk minimisation (ERM) imposes a necessary geometric constraint on learned representations: any encoder that minimises...

2 weeks ago cs.LG cs.AI cs.CV PDF

Attack MEDIUM

Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles

Irti Haq, Belén Saldías

As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical....

2 weeks ago cs.CY cs.AI cs.CL PDF

Benchmark MEDIUM

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Ari Azarafrooz

AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips...

2 weeks ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach

Mohammad Farhad, Shuvalaxmi Dass

Software security relies on effective vulnerability detection and patching, yet determining whether a patch fully eliminates risk remains an...

2 weeks ago cs.SE cs.CR PDF

Tool MEDIUM

AVISE: Framework for Evaluating the Security of AI Systems

Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi

As artificial intelligence (AI) systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of...

2 weeks ago cs.CR cs.AI cs.CL PDF

Defense MEDIUM

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski +6 more

Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities...

2 weeks ago cs.CR cs.LG PDF

Benchmark MEDIUM

Anchor-and-Resume Concession Under Dynamic Pricing for LLM-Augmented Freight Negotiation

Hoang Nguyen, Lu Wang, Marta Gaia Bras

Freight brokerages negotiate thousands of carrier rates daily under dynamic pricing conditions where models frequently revise targets...

2 weeks ago cs.MA cs.AI cs.CL PDF

Attack MEDIUM

Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

Abhijit Talluri

Adversarial robustness evaluation underpins every claim of trustworthy ML deployment, yet the field suffers from fragmented protocols and undetected...

2 weeks ago cs.CR cs.LG PDF

Other MEDIUM

LayerTracer: A Joint Task-Particle and Vulnerable-Layer Analysis framework for Arbitrary Large Language Model Architectures

Yuhang Wu, Qinyuan Liu, Qiuyang Zhao +1 more

Currently, Large Language Models (LLMs) feature a diversified architectural landscape, including traditional Transformer, GateDeltaNet, and Mamba....

2 weeks ago cs.CL cs.AI PDF

Defense MEDIUM

SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs

Chao Pan, Yu Wu, Xin Yao

Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion...

2 weeks ago cs.CR cs.AI cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial