AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 2201–2220 of 2,560 papers

Defense LOW

Fit for Purpose? Deepfake Detection in the Real World

Guangyu Lin, Li Lin, Christina P. Walker +2 more

The rapid proliferation of AI-generated content, driven by advances in generative adversarial networks, diffusion models, and multimodal large...

6 months ago cs.CV PDF

Attack HIGH

Colliding with Adversaries at ECML-PKDD 2025 Adversarial Attack Competition 1st Prize Solution

Dimitris Stefanopoulos, Andreas Voskou

This report presents the winning solution for Task 1 of Colliding with Adversaries: A Challenge on Robust Learning in High Energy Physics Discovery...

6 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents

David Peer, Sebastian Stabinger

Large Language Models (LLMs) have demonstrated impressive capabilities, yet their deployment in high-stakes domains is hindered by inherent...

6 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

EditMark: Watermarking Large Language Models based on Model Editing

Shuai Li, Kejiang Chen, Jun Jiang +5 more

Large Language Models (LLMs) have demonstrated remarkable capabilities, but their training requires extensive data and computational resources,...

6 months ago cs.CR PDF

Benchmark LOW

When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking

Mohammad Abdul Rehman, Syed Imad Ali Shah, Abbas Anwar +2 more

The remarkable capabilities of Large Language Models (LLMs) in natural language understanding and generation have sparked interest in their potential...

6 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Detecting Adversarial Fine-tuning with Auditing Agents

Sarah Egler, John Schulman, Nicholas Carlini

Large Language Model (LLM) providers expose fine-tuning APIs that let end users fine-tune their frontier LLMs. Unfortunately, it has been shown that...

6 months ago cs.CR cs.AI PDF

Defense MEDIUM

SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection

Yang Feng, Xudong Pan

Malicious agents pose significant threats to the reliability and decision-making capabilities of Multi-Agent Systems (MAS) powered by Large Language...

6 months ago cs.CR cs.AI PDF

Tool HIGH

Prompt injections as a tool for preserving identity in GAI image descriptions

Kate Glazko, Jennifer Mankoff

Generative AI risks such as bias and lack of representation impact people who do not interact directly with GAI systems, but whose content does:...

6 months ago cs.CR cs.CY PDF

Attack HIGH

The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers

Owais Makroo, Siva Rajesh Kasa, Sumegh Roychowdhury +4 more

Membership Inference Attacks (MIAs) pose a critical privacy threat by enabling adversaries to determine whether a specific sample was included in a...

6 months ago cs.CR cs.CL cs.LG PDF

Tool LOW

MirrorFuzz: Leveraging LLM and Shared Bugs for Deep Learning Framework APIs Fuzzing

Shiwen Ou, Yuwei Li, Lu Yu +6 more

Deep learning (DL) frameworks serve as the backbone for a wide range of artificial intelligence applications. However, bugs within DL frameworks can...

6 months ago cs.SE cs.CR PDF

Defense MEDIUM

MalCVE: Malware Detection and CVE Association Using Large Language Models

Eduard Andrei Cristea, Petter Molnes, Jingyue Li

Malicious software attacks are having an increasingly significant economic impact. Commercial malware detection software can be costly, and tools...

6 months ago cs.CR cs.SE PDF

Benchmark LOW

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios

Yao Huang, Yitong Sun, Yichi Zhang +3 more

Despite the remarkable advances of Large Language Models (LLMs) across diverse cognitive tasks, the rapid enhancement of these capabilities also...

6 months ago cs.CL cs.AI cs.LG PDF

Defense MEDIUM

HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment

Yuexiao Liu, Lijun Li, Xingjun Wang +1 more

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have gained significant attention due to their objective and verifiable...

6 months ago cs.CR PDF

Defense LOW

Selecting and Combining Large Language Models for Scalable Code Clone Detection

Muslim Chochlov, Gul Aftab Ahmed, James Vincent Patten +4 more

Source code clones pose risks ranging from intellectual property violations to unintended vulnerabilities. Effective and efficient scalable clone...

6 months ago cs.SE cs.AI PDF

Survey MEDIUM

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

Hanbin Hong, Shuya Feng, Nima Naderloui +6 more

Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their...

6 months ago cs.CR cs.AI PDF

Attack HIGH

Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jialing Tao +2 more

Despite extensive alignment efforts, Large Vision-Language Models (LVLMs) remain vulnerable to jailbreak attacks, posing serious safety risks. To...

6 months ago cs.CV cs.AI PDF

Benchmark LOW

VERA-MH Concept Paper

Luca Belli, Kate Bentley, Will Alexander +5 more

We introduce VERA-MH (Validation of Ethical and Responsible AI in Mental Health), an automated evaluation of the safety of AI chatbots used in mental...

6 months ago cs.CY cs.AI cs.HC PDF

Defense MEDIUM

OCR-APT: Reconstructing APT Stories from Audit Logs using Subgraph Anomaly Detection and LLMs

Ahmed Aly, Essam Mansour, Amr Youssef

Advanced Persistent Threats (APTs) are stealthy cyberattacks that often evade detection in system-level audit logs. Provenance graphs model these...

6 months ago cs.CR cs.LG PDF

Defense MEDIUM

PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models

Issam Seddik, Sami Souihi, Mohamed Tamaazousti +1 more

As Large Language Models (LLMs) gain traction across critical domains, ensuring secure and trustworthy training processes has become a major concern....

6 months ago cs.CR cs.LG PDF

Attack HIGH

Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling

Deyue Zhang, Dongdong Yang, Junjie Mu +6 more

Multimodal large language models (MLLMs) exhibit remarkable capabilities but remain susceptible to jailbreak attacks exploiting cross-modal...

6 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial