AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 121–140 of 151 papers

Clear filters

Defense MEDIUM

SafeAgent: A Runtime Protection Architecture for Agentic Systems

Hailin Liu, Eugene Ilyushin, Jie Ni +1 more

Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and...

3 weeks ago cs.AI cs.MA PDF

Attack MEDIUM

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

Jianming Tong, Hanshen Xiao, Krishna Kumar Nair +5 more

Multi-user virtual reality enables immersive interaction. However, rendering avatars for numerous participants on each headset incurs prohibitive...

3 weeks ago cs.CR cs.AR cs.CV PDF

Benchmark MEDIUM

Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions

Dongwook Lee, Eunwoo Song, Che Hyun Lee +2 more

While recent Spoken Language Models (SLMs) have been actively deployed in real-world scenarios, they lack the capability to discern Third-Party...

3 weeks ago cs.CL cs.AI cs.SD PDF

Benchmark MEDIUM

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi

The rapid adoption of open-source Large Language Models (LLMs) in offline and enterprise environments has introduced a largely unexamined security...

3 weeks ago cs.CR PDF

Other MEDIUM

Feedback-Driven Execution for LLM-Based Binary Analysis

XiangRui Zhang, Qiang Li, Haining Wang

Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing...

3 weeks ago cs.CR PDF

Attack MEDIUM

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

Xuanli He, Bilgehan Sel, Faizan Ali +3 more

Large Language Models (LLMs) are increasingly exposed to adaptive jailbreaking, particularly in high-stakes Chemical, Biological, Radiological, and...

3 weeks ago cs.CL cs.CR PDF

Attack MEDIUM

NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Firas Ben Hmida, Philemon Hailemariam, Kashif Ali Khan +1 more

Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such...

3 weeks ago cs.CR PDF

Attack MEDIUM

From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution

Pavel Chizhov, Egor Bogomolov, Ivan P. Yamshchikov

Efficiency and safety of Large Language Models (LLMs), among other factors, rely on the quality of tokenization. A good tokenizer not only improves...

3 weeks ago cs.CL PDF

Benchmark MEDIUM

Learned or Memorized ? Quantifying Memorization Advantage in Code LLMs

Djiré Albérick Euraste, Kaboré Abdoul Kader, Jordan Samhi +3 more

The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data...

3 weeks ago cs.SE PDF

Survey MEDIUM

MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems

Yi Ting Shen, Kentaroh Toyoda, Alex Leung

The rapid proliferation of Model Context Protocol (MCP)-based agentic systems has introduced a new category of security threats that existing...

3 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen +9 more

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use,...

3 weeks ago cs.CR cs.AI PDF

Defense MEDIUM

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +20 more

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and...

3 weeks ago cs.LG PDF

Defense MEDIUM

Can Agents Secure Hardware? Evaluating Agentic LLM-Driven Obfuscation for IP Protection

Sujan Ghimire, Parsa Mirfasihi, Muhtasim Alam Chowdhury +6 more

The globalization of integrated circuit (IC) design and manufacturing has increased the exposure of hardware intellectual property (IP) to untrusted...

4 weeks ago cs.CR PDF

Benchmark MEDIUM

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar +2 more

3D Gaussian Splatting (3DGS) has recently enabled highly photorealistic 3D reconstruction from casually captured multi-view images. However, this...

4 weeks ago cs.CV cs.CR cs.LG PDF

Benchmark MEDIUM

Parallax: Why AI Agents That Think Must Never Act

Joel Fokou

Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise...

4 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

Shaopeng Fu, Di Wang

Adversarial training (AT) is an effective defense for large language models (LLMs) against jailbreak attacks, but performing AT on LLMs is costly. To...

4 weeks ago cs.LG cs.CR stat.ML PDF

Attack MEDIUM

Robust Semi-Supervised Temporal Intrusion Detection for Adversarial Cloud Networks

Anasuya Chattopadhyay, Daniel Reti, Hans D. Schotten

Cloud networks increasingly rely on machine learning based Network Intrusion Detection Systems to defend against evolving cyber threats. However,...

4 weeks ago cs.LG cs.CR PDF

Attack MEDIUM

LLM-Guided Prompt Evolution for Password Guessing

Vladimir A. Mazin, Mikhail A. Zorin, Dmitrii S. Korzh +3 more

Passwords still remain a dominant authentication method, yet their security is routinely subverted by predictable user choices and large-scale...

4 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

VeriX-Anon: A Multi-Layered Framework for Mathematically Verifiable Outsourced Target-Driven Data Anonymization

Miit Daga, Swarna Priya Ramu

Organisations increasingly outsource privacy-sensitive data transformations to cloud providers, yet no practical mechanism lets the data owner verify...

4 weeks ago cs.CR cs.DB cs.LG PDF

Benchmark MEDIUM

Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors

Rui Yin, Tianxu Han, Naen Xu +8 more

Safety-aligned large language models (LLMs) are increasingly deployed in real-world pipelines, yet this deployment also enlarges the supply-chain...

4 weeks ago cs.CR cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial