AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 61–80 of 440 papers

Clear filters

Benchmark MEDIUM

Towards Automated Pentesting with Large Language Models

Ricardo Bessa, Rui Claro, João Trindade +1 more

Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human...

4 weeks ago cs.CR PDF

Benchmark MEDIUM

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

Hanbo Huang, Xuan Gong, Yiran Zhang +2 more

Large language model (LLM) watermarking has emerged as a promising approach for detecting and attributing AI-generated text, yet its robustness to...

4 weeks ago cs.CR PDF

Benchmark MEDIUM

RedShell: A Generative AI-Based Approach to Ethical Hacking

Ricardo Bessa, Rui Claro, João Trindade +1 more

The application of Machine Learning techniques in code generation is now a common practice for most developers. Tools such as ChatGPT from OpenAI...

4 weeks ago cs.CR PDF

Benchmark MEDIUM

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models

Xiaomeng Hu, Yinger Zhang, Fei Huang +7 more

AI agents are expected to perform professional work across hundreds of occupational domains (from emergency department triage to nuclear reactor...

1 months ago cs.CL PDF

Benchmark MEDIUM

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

Yuchen Chen, Yuan Xiao, Chunrong Fang +2 more

The proliferation of large language models for code (CodeLMs) and open-source contributions has heightened concerns over unauthorized use of source...

1 months ago cs.CR PDF

Benchmark MEDIUM

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

Wenhao Yuan, Chenchen Lin, Jian Chen +3 more

In large language model (LLM) agents, reasoning trajectories are treated as reliable internal beliefs for guiding actions and updating memory....

1 months ago cs.AI cs.CL PDF

Benchmark MEDIUM

ADAG: Automatically Describing Attribution Graphs

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt +1 more

In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular...

1 months ago cs.CL PDF

Benchmark MEDIUM

ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Yu Liang, Liangxin Liu, Longzheng Wang +5 more

Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering...

1 months ago cs.AI cs.CL cs.LG PDF

Benchmark MEDIUM

Validated Intent Compilation for Constrained Routing in LEO Mega-Constellations

Yuanhang Li

Operating LEO mega-constellations requires translating high-level operator intents ("reroute financial traffic away from polar links under 80 ms")...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation

Geert Trooskens, Aaron Karlsberg, Anmol Sharma +6 more

We study compiled AI, a paradigm in which large language models generate executable code artifacts during a compilation phase, after which workflows...

1 months ago cs.SE cs.AI PDF

Benchmark MEDIUM

From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism

Zhuohao Yu, Zhiwei Steven Wu, Adam Block

Inference-time compute scaling has emerged as a powerful paradigm for improving language model performance on a wide range of tasks, but the question...

1 months ago cs.LG PDF

Benchmark MEDIUM

Beyond Standard Benchmarks: A Systematic Audit of Vision-Language Model's Robustness to Natural Semantic Variation Across Diverse Tasks

Jia Chengyu, AprilPyone MaungMaung, Huy H. Nguyen +2 more

Recent advances in vision-language models (VLMs) trained on web-scale image-text pairs have enabled impressive zero-shot transfer across a diverse...

1 months ago cs.CV PDF

Benchmark MEDIUM

Bounded by Risk, Not Capability: Quantifying AI Occupational Substitution Rates via a Tech-Risk Dual-Factor Model

Shuyao Gao, Minghao Huang

The deployment of Large Language Models (LLMs) has ignited concerns about technological unemployment. Existing task-based evaluations predominantly...

1 months ago cs.CY econ.GN PDF

Benchmark MEDIUM

Quantifying Self-Preservation Bias in Large Language Models

Matteo Migliarini, Joaquin Pereira Pizzini, Luca Moresca +3 more

Instrumental convergence predicts that sufficiently advanced AI agents will resist shutdown, yet current safety training (RLHF) may obscure this risk...

1 months ago cs.AI PDF

Benchmark MEDIUM

From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Yiheng Huang, Zhijia Zhao, Bihuan Chen +5 more

The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new...

1 months ago cs.CR cs.SE PDF

Benchmark MEDIUM

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Weidi Luo, Xiaofei Wen, Tenghao Huang +5 more

Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food...

1 months ago cs.CR PDF

Benchmark MEDIUM

SERSEM: Selective Entropy-Weighted Scoring for Membership Inference in Code Language Models

Kıvanç Kuzey Dikici, Serdar Kara, Semih Çağlar +2 more

As Large Language Models (LLMs) for code increasingly utilize massive, often non-permissively licensed datasets, evaluating data contamination...

1 months ago cs.SE cs.CR PDF

Benchmark MEDIUM

EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method

Yanting Wang, Jinyuan Jia

Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building...

1 months ago cs.CR PDF

Benchmark MEDIUM

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

Yubo Li, Lu Zhang, Tianchong Jiang +2 more

Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a...

1 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

Yicheng Cai, Mitchell John DeStefano, Guodong Dong +5 more

As Large Language Models (LLMs) and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations,...

1 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial