AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 761–780 of 867 papers

Clear filters

Tool HIGH

BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing

Caelin Kaplan, Alexander Warnecke, Neil Archibald

AI models are being increasingly integrated into real-world systems, raising significant concerns about their safety and security. Consequently, AI...

7 months ago cs.CR cs.AI PDF

Tool HIGH

PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities

Zicheng Liu, Lige Huang, Jie Zhang +3 more

The increasing autonomy of Large Language Models (LLMs) necessitates a rigorous evaluation of their potential to aid in cyber offense. Existing...

7 months ago cs.CR cs.AI PDF

Attack HIGH

LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings

Ting Li, Yang Yang, Yipeng Yu +3 more

Adversarial attacks on knowledge graph embeddings (KGE) aim to disrupt the model's ability of link prediction by removing or inserting triples. A...

7 months ago cs.CL cs.CR PDF

Tool HIGH

Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems

Pengyu Zhu, Lijun Li, Yaxing Lyu +3 more

LLM-based multi-agent systems (MAS) demonstrate increasing integration into next-generation applications, but their safety in backdoor attacks...

7 months ago cs.CR PDF

Attack HIGH

Attacks by Content: Automated Fact-checking is an AI Security Issue

Michael Schlichtkrull

When AI agents retrieve and reason over external documents, adversaries can manipulate the data they receive to subvert their behaviour. Previous...

7 months ago cs.CL cs.AI PDF

Attack HIGH

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli

Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the...

7 months ago cs.CR cs.AI PDF

Tool HIGH

DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

Hyeseon An, Shinwoo Park, Suyeon Woo +1 more

The promise of LLM watermarking rests on a core assumption that a specific watermark proves authorship by a specific model. We demonstrate that this...

7 months ago cs.CR cs.AI PDF

Attack HIGH

DropVLA: An Action-Level Backdoor Attack on Vision-Language-Action Models

Zonghuan Xu, Jiayu Li, Yunhan Zhao +3 more

Vision-Language-Action (VLA) models map multimodal perception and language instructions to executable robot actions, making them particularly...

7 months ago cs.CR cs.AI cs.RO PDF

Attack HIGH

SASER: Stego attacks on open-source LLMs

Ming Tan, Wei Li, Hu Tao +4 more

Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks...

7 months ago cs.CR cs.AI PDF

Attack HIGH

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng +2 more

The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security...

7 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation

Wentian Zhu, Zhen Xiang, Wei Niu +1 more

Unlike regular tokens derived from existing text corpora, special tokens are artificially created to annotate structured conversations during the...

7 months ago cs.CR cs.AI PDF

Attack HIGH

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Yutao Wu, Xiao Liu, Yinghui Li +5 more

Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases,...

7 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

A Systematic Study on Generating Web Vulnerability Proof-of-Concepts Using Large Language Models

Mengyao Zhao, Kaixuan Li, Lyuye Zhang +4 more

Recent advances in Large Language Models (LLMs) have brought remarkable progress in code understanding and reasoning, creating new opportunities and...

7 months ago cs.SE PDF

Attack HIGH

Adversarial Attacks on Downstream Weather Forecasting Models: Application to Tropical Cyclone Trajectory Prediction

Yue Deng, Francisco Santos, Pang-Ning Tan +1 more

Deep learning based weather forecasting (DLWF) models leverage past weather observations to generate future forecasts, supporting a wide range of...

7 months ago cs.LG cs.CR stat.ML PDF

Attack HIGH

Text Prompt Injection of Vision Language Models

Ruizhe Zhu

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt...

7 months ago cs.CL cs.CV PDF

Attack HIGH

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou +4 more

AI control protocols serve as a defense mechanism to stop untrusted LLM agents from causing harm in autonomous settings. Prior work treats this as a...

7 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Provable Watermarking for Data Poisoning Attacks

Yifan Zhu, Lijia Yu, Xiao-Shan Gao

In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying...

7 months ago cs.CR cs.LG PDF

Tool HIGH

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Dennis Rall, Bernhard Bauer, Mohit Mittal +1 more

Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like...

7 months ago cs.CR cs.CL PDF

Attack HIGH

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin +11 more

How should we evaluate the robustness of language model defenses? Current defenses against jailbreaks and prompt injections (which aim to prevent an...

7 months ago cs.LG cs.CR PDF

Attack HIGH

Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models

Ragib Amin Nihal, Rui Wen, Kazuhiro Nakadai +1 more

Large language models (LLMs) remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints...

7 months ago cs.CL cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial