AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 241–260 of 294 papers

Clear filters

Benchmark LOW

Why Do Vision Language Models Struggle To Recognize Human Emotions?

Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara +1 more

Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made...

3 weeks ago cs.CV cs.AI PDF

Other MEDIUM

Feedback-Driven Execution for LLM-Based Binary Analysis

XiangRui Zhang, Qiang Li, Haining Wang

Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing...

3 weeks ago cs.CR PDF

Attack HIGH

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

Haochun Tang, Yuliang Yan, Jiahua Lu +2 more

Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the...

3 weeks ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

Xuanli He, Bilgehan Sel, Faizan Ali +3 more

Large Language Models (LLMs) are increasingly exposed to adaptive jailbreaking, particularly in high-stakes Chemical, Biological, Radiological, and...

3 weeks ago cs.CL cs.CR PDF

Other LOW

AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

Peifeng Zhang, Zice Qiu, Donghua Yu +4 more

In continual visual question answering (VQA), existing Continual Learning (CL) methods are mostly built for symmetric, unimodal architectures....

3 weeks ago cs.CV cs.CL PDF

Attack HIGH

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu +2 more

Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however,...

3 weeks ago cs.CR cs.AI cs.SD PDF

Attack MEDIUM

NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

Firas Ben Hmida, Philemon Hailemariam, Kashif Ali Khan +1 more

Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such...

3 weeks ago cs.CR PDF

Attack HIGH

Robustness Analysis of Machine Learning Models for IoT Intrusion Detection Under Data Poisoning Attacks

Fortunatus Aabangbio Wulnye, Justice Owusu Agyemang, Kwame Opuni-Boachie Obour Agyekum +3 more

Ensuring the reliability of machine learning-based intrusion detection systems remains a critical challenge in Internet of Things (IoT) environments,...

3 weeks ago cs.CR cs.AI PDF

Attack MEDIUM

From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution

Pavel Chizhov, Egor Bogomolov, Ivan P. Yamshchikov

Efficiency and safety of Large Language Models (LLMs), among other factors, rely on the quality of tokenization. A good tokenizer not only improves...

3 weeks ago cs.CL PDF

Benchmark MEDIUM

Learned or Memorized ? Quantifying Memorization Advantage in Code LLMs

Djiré Albérick Euraste, Kaboré Abdoul Kader, Jordan Samhi +3 more

The lack of transparency about code datasets used to train large language models (LLMs) makes it difficult to detect, evaluate, and mitigate data...

3 weeks ago cs.SE PDF

Survey MEDIUM

MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems

Yi Ting Shen, Kentaroh Toyoda, Alex Leung

The rapid proliferation of Model Context Protocol (MCP)-based agentic systems has introduced a new category of security threats that existing...

3 weeks ago cs.CR cs.AI PDF

Benchmark MEDIUM

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin, Yang Liu, Yancheng Chen +9 more

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use,...

3 weeks ago cs.CR cs.AI PDF

Defense LOW

Golden Handcuffs make safer AI agents

Aram Ebtekar, Michael K. Cohen

Reinforcement learners can attain high reward through novel unintended strategies. We study a Bayesian mitigation for general environments: we expand...

3 weeks ago cs.LG cs.AI PDF

Defense MEDIUM

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Xiaohua Wang, Muzhao Tian, Yuqi Zeng +20 more

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and...

3 weeks ago cs.LG PDF

Tool LOW

Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy

Shawn, Zhong, Junxuan Liao +4 more

AI coding agents operate directly on users' filesystems, where they regularly corrupt data, delete files, and leak secrets. Current approaches force...

3 weeks ago cs.OS PDF

Benchmark LOW

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

Eun Woo Im, Dhruv Madhwal, Vivek Gupta

Vision-Language Models demonstrate remarkable capabilities but often struggle with compositional reasoning, exhibiting vulnerabilities regarding word...

4 weeks ago cs.LG PDF

Attack HIGH

Threat Modeling and Attack Surface Analysis of IoT-Enabled Controlled Environment Agriculture Systems

Andrii Vakhnovskyi

The United States designates Food and Agriculture as one of sixteen critical infrastructure sectors, yet no mandatory cybersecurity requirements...

4 weeks ago cs.CR eess.SY PDF

Defense MEDIUM

Can Agents Secure Hardware? Evaluating Agentic LLM-Driven Obfuscation for IP Protection

Sujan Ghimire, Parsa Mirfasihi, Muhtasim Alam Chowdhury +6 more

The globalization of integrated circuit (IC) design and manufacturing has increased the exposure of hardware intellectual property (IP) to untrusted...

4 weeks ago cs.CR PDF

Benchmark MEDIUM

PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction

Prajas Wadekar, Venkata Sai Pranav Bachina, Kunal Bhosikar +2 more

3D Gaussian Splatting (3DGS) has recently enabled highly photorealistic 3D reconstruction from casually captured multi-view images. However, this...

4 weeks ago cs.CV cs.CR cs.LG PDF

Tool LOW

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu +7 more

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures....

4 weeks ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial