AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 281–300 of 1,935 papers

Clear filters

Benchmark MEDIUM

Learned or Memorized ? Quantifying Memorization Advantage in Code LLMs

Djiré Albérick Euraste, Kaboré Abdoul Kader, Jordan Samhi +3 more

The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data...

3 weeks ago cs.SE PDF

Survey MEDIUM

MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems

Yi Ting Shen, Kentaroh Toyoda, Alex Leung

The rapid proliferation of Model Context Protocol (MCP)-based agentic systems has introduced a new category of security threats that existing...

3 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen +9 more

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use,...

4 weeks ago cs.CR cs.AI PDF

Defense LOW

Golden Handcuffs make safer AI agents

Aram Ebtekar, Michael K. Cohen

Reinforcement learners can attain high reward through novel unintended strategies. We study a Bayesian mitigation for general environments: we expand...

4 weeks ago cs.LG cs.AI PDF

Defense MEDIUM

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +20 more

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and...

4 weeks ago cs.LG PDF

Tool LOW

Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy

Shawn, Zhong, Junxuan Liao +4 more

AI coding agents operate directly on users' filesystems, where they regularly corrupt data, delete files, and leak secrets. Current approaches force...

4 weeks ago cs.OS PDF

Benchmark LOW

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

Eun Woo Im, Dhruv Madhwal, Vivek Gupta

Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word...

4 weeks ago cs.LG PDF

Attack HIGH

Threat Modeling and Attack Surface Analysis of IoT-Enabled Controlled Environment Agriculture Systems

Andrii Vakhnovskyi

The United States designates Food and Agriculture as one of sixteen critical infrastructure sectors, yet no mandatory cybersecurity requirements...

4 weeks ago cs.CR eess.SY PDF

Defense MEDIUM

Can Agents Secure Hardware? Evaluating Agentic LLM-Driven Obfuscation for IP Protection

Sujan Ghimire, Parsa Mirfasihi, Muhtasim Alam Chowdhury +6 more

The globalization of integrated circuit (IC) design and manufacturing has increased the exposure of hardware intellectual property (IP) to untrusted...

4 weeks ago cs.CR PDF

Benchmark MEDIUM

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar +2 more

3D Gaussian Splatting (3DGS) has recently enabled highly photorealistic 3D reconstruction from casually captured multi-view images. However, this...

4 weeks ago cs.CV cs.CR cs.LG PDF

Tool LOW

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu +7 more

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures....

4 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

Parallax: Why AI Agents That Think Must Never Act

Joel Fokou

Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise...

4 weeks ago cs.CR cs.AI PDF

Attack HIGH

Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks

Yingying Zhao, Chengyin Hu, Qike Zhang +7 more

Vision-Language Models (VLMs) have shown remarkable performance, yet their security remains insufficiently understood. Existing adversarial studies...

4 weeks ago cs.CV PDF

Attack MEDIUM

Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

Shaopeng Fu, Di Wang

Adversarial training (AT) is an effective defense for large language models (LLMs) against jailbreak attacks, but performing AT on LLMs is costly. To...

4 weeks ago cs.LG cs.CR stat.ML PDF

Attack MEDIUM

Robust Semi-Supervised Temporal Intrusion Detection for Adversarial Cloud Networks

Anasuya Chattopadhyay, Daniel Reti, Hans D. Schotten

Cloud networks increasingly rely on machine learning based Network Intrusion Detection Systems to defend against evolving cyber threats. However,...

4 weeks ago cs.LG cs.CR PDF

Attack HIGH

Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs

Jianhao Chen, Haoyang Chen, Hanjie Zhao +2 more

The rapid evolution of Vision-Language Models (VLMs) has catalyzed unprecedented capabilities in artificial intelligence; however, this continuous...

4 weeks ago cs.AI cs.MM PDF

Attack MEDIUM

LLM-Guided Prompt Evolution for Password Guessing

Vladimir A. Mazin, Mikhail A. Zorin, Dmitrii S. Korzh +3 more

Passwords still remain a dominant authentication method, yet their security is routinely subverted by predictable user choices and large-scale...

4 weeks ago cs.CR cs.AI PDF

Attack HIGH

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

Junyu Ren, Xingjian Pan, Wensheng Gan +1 more

Prompt injection has emerged as a critical security threat to large language models (LLMs), yet existing studies predominantly focus on...

4 weeks ago cs.CR PDF

Benchmark MEDIUM

VeriX-Anon: A Multi-Layered Framework for Mathematically Verifiable Outsourced Target-Driven Data Anonymization

Miit Daga, Swarna Priya Ramu

Organisations increasingly outsource privacy-sensitive data transformations to cloud providers, yet no practical mechanism lets the data owner verify...

4 weeks ago cs.CR cs.DB cs.LG PDF

Attack HIGH

Reading Between the Pixels: Linking Text-Image Embedding Alignment to Typographic Attack Success on Vision-Language Models

Ravikumar Balakrishnan, Sanket Mendapara, Ankit Garg

We study typographic prompt injection attacks on vision-language models (VLMs), where adversarial text is rendered as images to bypass safety...

4 weeks ago cs.CV PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial