AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 661–680 of 982 papers

Clear filters

Attack MEDIUM

On the Regulatory Potential of User Interfaces for AI Agent Governance

K. J. Kevin Feng, Tae Soo Kim, Rock Yuren Pang +3 more

AI agents that take actions in their environment autonomously over extended time horizons require robust governance interventions to curb their...

5 months ago cs.CY cs.AI PDF

Attack HIGH

Concept-Guided Backdoor Attack on Vision Language Models

Haoyu Shen, Weimin Lyu, Haotian Xu +1 more

Vision-Language Models (VLMs) have achieved impressive progress in multimodal text generation, yet their rapid adoption raises increasing concerns...

5 months ago cs.CR cs.AI PDF

Attack HIGH

WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

Mohammad M Maheri, Xavier Cadet, Peter Chin +1 more

Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical...

5 months ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

An Empirical Study on the Security Vulnerabilities of GPTs

Tong Wu, Weibin Wu, Zibin Zheng

Equipped with various tools and knowledge, GPTs, one kind of customized AI agents based on OpenAI's large language models, have illustrated great...

5 months ago cs.CR cs.SE PDF

Attack MEDIUM

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

Zeng Wang, Minghao Shao, Akashdeep Saha +4 more

Graph neural networks (GNNs) have shown promise in hardware security by learning structural motifs from netlist graphs. However, this reliance on...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Richard J. Young

Large Language Model (LLM) safety guardrail models have emerged as a primary defense mechanism against harmful content generation, yet their...

5 months ago cs.CR PDF

Attack HIGH

Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression

Tianyu Zhang, Zihang Xi, Jingyu Hua +1 more

In the realm of black-box jailbreak attacks on large language models (LLMs), the feasibility of constructing a narrow safety proxy, a lightweight...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Securing the Model Context Protocol (MCP): Risks, Controls, and Governance

Herman Errico, Jiquan Ngiam, Shanita Sojan

The Model Context Protocol (MCP) replaces static, developer-controlled API integrations with more dynamic, user-driven agent systems, which also...

5 months ago cs.CR PDF

Attack HIGH

BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley +3 more

The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application...

5 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

Jakub Hoscilowicz, Artur Janicki

We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted...

5 months ago cs.CL PDF

Attack MEDIUM

Ranking-Enhanced Anomaly Detection Using Active Learning-Assisted Attention Adversarial Dual AutoEncoders

Sidahmed Benabderrahmane, James Cheney, Talal Rahwan

Advanced Persistent Threats (APTs) pose a significant challenge in cybersecurity due to their stealthy and long-term nature. Modern supervised...

5 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs

Sen Nie, Jie Zhang, Jianxin Yan +2 more

Adversarial attacks have evolved from simply disrupting predictions on conventional task-specific models to the more complex goal of manipulating...

5 months ago cs.CV PDF

Attack MEDIUM

Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts

Steven Peh

Large Language Models (LLMs) remain vulnerable to prompt injection attacks, representing the most significant security threat in production...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Medusa: Cross-Modal Transferable Adversarial Attacks on Multimodal Medical Retrieval-Augmented Generation

Yingjia Shang, Yi Liu, Huimin Wang +4 more

With the rapid advancement of retrieval-augmented vision-language models, multimodal medical retrieval-augmented generation (MMed-RAG) systems are...

5 months ago cs.CR cs.AI cs.LG PDF

Attack HIGH

FedPoisonTTP: A Threat Model and Poisoning Attack for Federated Test-Time Personalization

Md Akil Raihan Iftee, Syed Md. Ahnaf Hasan, Amin Ahsan Ali +3 more

Test-time personalization in federated learning enables models at clients to adjust online to local domain shifts, enhancing robustness and...

5 months ago cs.CR cs.CV PDF

Attack HIGH

Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

Xurui Li, Kaisong Song, Rui Zhu +2 more

Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks. Existing...

5 months ago cs.CR cs.AI PDF

Attack HIGH

AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents

Yixin Wu, Rui Wen, Chi Cui +2 more

Inference attacks have been widely studied and offer a systematic risk assessment of ML services; however, their implementation and the attack...

5 months ago cs.CR cs.AI PDF

Attack HIGH

Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations

Ryan Wong, Hosea David Yu Fei Ng, Dhananjai Sharma +2 more

Large Language Models (LLMs) remain susceptible to jailbreak exploits that bypass safety filters and induce harmful or unethical behavior. This work...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

Adarsh Kumarappan, Ayushi Mehrotra

The SmoothLLM defense provides a certification guarantee against jailbreaking attacks, but it relies on a strict "k-unstable" assumption that rarely...

5 months ago cs.LG cs.AI PDF

Attack HIGH

Automating Deception: Scalable Multi-Turn LLM Jailbreaks

Adarsh Kumarappan, Ananya Mujoo

Multi-turn conversational attacks, which leverage psychological principles like Foot-in-the-Door (FITD), where a small initial request paves the way...

5 months ago cs.LG cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial