AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 121–140 of 312 papers

Clear filters

Attack MEDIUM

Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models

Youngji Roh, Hyunjin Cho, Jaehyung Kim

Large Language Models (LLMs) exhibit highly anisotropic internal representations, often characterized by massive activations, a phenomenon where a...

3 months ago cs.CL PDF

Attack MEDIUM

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning

Zeming Wei, Qiaosheng Zhang, Xia Hu +1 more

Large Reasoning Models (LRMs) have achieved tremendous success with their chain-of-thought (CoT) reasoning, yet also face safety issues similar to...

3 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning

Andrew Draganov, Tolga H. Dur, Anandmayi Bhongade +1 more

We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

Is It Possible to Make Chatbots Virtuous? Investigating a Virtue-Based Design Methodology Applied to LLMs

Matthew P. Lad, Louisa Conwill, Megan Levis Scheirer

With the rapid growth of Large Language Models (LLMs), criticism of their societal impact has also grown. Work in Responsible AI (RAI) has focused on...

3 months ago cs.HC PDF

Attack MEDIUM

Monotonicity as an Architectural Bias for Robust Language Models

Patrick Cooper, Alireza Nadali, Ashutosh Trivedi +1 more

Large language models (LLMs) are known to exhibit brittle behavior under adversarial prompts and jailbreak attacks, even after extensive alignment...

3 months ago cs.CL cs.AI cs.CR PDF

Attack MEDIUM

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

Ching-Yun Ko, Pin-Yu Chen

Modern artificial intelligence (AI) models are deployed on inference engines to optimize runtime efficiency and resource allocation, particularly for...

3 months ago cs.LG cs.CL cs.PL PDF

Attack MEDIUM

Context Dependence and Reliability in Autoregressive Language Models

Poushali Sengupta, Shashi Raj Pandey, Sabita Maharjan +1 more

Large language models (LLMs) generate outputs by utilizing extensive context, which often includes redundant information from prompts, retrieved...

3 months ago cs.CL cs.AI stat.ML PDF

Attack MEDIUM

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Eliron Rahimi, Elad Hirshel, Rom Himelstein +3 more

Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) models, offering parallel decoding and...

3 months ago cs.LG cs.AI PDF

Attack MEDIUM

SMCP: Secure Model Context Protocol

Xinyi Hou, Shenao Wang, Yifan Zhang +4 more

Agentic AI systems built around large language models (LLMs) are moving away from closed, single-model frameworks and toward open ecosystems that...

3 months ago cs.CR PDF

Attack MEDIUM

Unifying Adversarial Robustness and Training Across Text Scoring Models

Manveer Singh Tamber, Hosna Oyarhoseini, Jimmy Lin

Research on adversarial robustness in language models is currently fragmented across applications and attacks, obscuring shared vulnerabilities. In...

3 months ago cs.CL cs.IR PDF

Attack MEDIUM

WiFiPenTester: Advancing Wireless Ethical Hacking with Governed GenAI

Haitham S. Al-Sinani, Chris J. Mitchell

Wireless ethical hacking relies heavily on skilled practitioners manually interpreting reconnaissance results and executing complex, time-sensitive...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling

Mingqian Feng, Xiaodong Liu, Weiwei Yang +3 more

Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting, which underestimates...

3 months ago cs.AI PDF

Attack MEDIUM

ZK-HybridFL: Zero-Knowledge Proof-Enhanced Hybrid Ledger for Federated Learning

Amirhossein Taherpour, Xiaodong Wang

Federated learning (FL) enables collaborative model training while preserving data privacy, yet both centralized and decentralized approaches face...

3 months ago cs.LG cs.CR cs.DC PDF

Attack MEDIUM

Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents

Mingyang Liao, Yichen Wan, shuchen wu +6 more

LLM-based role-playing has rapidly improved in fidelity, yet stronger adherence to persona constraints commonly increases vulnerability to jailbreak...

3 months ago cs.AI PDF

Attack MEDIUM

RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing

Wenhui Zhang, Huiyu Xu, Zhibo Wang +4 more

Recent advancements in multi-model AI systems have leveraged LLM routers to reduce computational cost while maintaining response quality by assigning...

3 months ago cs.CR PDF

Attack MEDIUM

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Alvi Md Ishmam, Najibul Haque Sarker, Zaber Ibn Abdul Hakim +1 more

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to...

3 months ago cs.CV PDF

Attack MEDIUM

Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks

Arther Tian, Alex Ding, Frank Chen +2 more

Decentralized large language model inference networks require lightweight mechanisms to reward high quality outputs under heterogeneous latency and...

3 months ago cs.CR cs.AI PDF

Attack MEDIUM

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Jarrod Barnes

As large language models (LLMs) improve, so do their offensive applications: frontier agents now generate working exploits for under $50 in compute...

3 months ago cs.AI PDF

Attack MEDIUM

Diversifying Toxicity Search in Large Language Models Through Speciation

Onkar Shelar, Travis Desell

Evolutionary prompt search is a practical black-box approach for red teaming large language models (LLMs), but existing methods often collapse onto a...

3 months ago cs.NE q-bio.PE PDF

Attack MEDIUM

ShellForge: Adversarial Co-Evolution of Webshell Generation and Multi-View Detection for Robust Webshell Defense

Yizhong Ding

Webshells remain a primary foothold for attackers to compromise servers, particularly within PHP ecosystems. However, existing detection mechanisms...

3 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial