AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 61–80 of 312 papers

Clear filters

Attack MEDIUM

Retrieval-Augmented LLMs for Security Incident Analysis

Xavier Cadet, Aditya Vikram Singh, Harsh Mamania +6 more

Investigating cybersecurity incidents requires collecting and analyzing evidence from multiple log sources, including intrusion detection alerts,...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems

Patrick Levi

Retrieval augmented generation systems have become an integral part of everyday life. Whether in internet search engines, email systems, or service...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation

Kushankur Ghosh, Mehar Klair, Kian Kyars +2 more

Provenance graphs model causal system-level interactions from logs, enabling anomaly detectors to learn normal behavior and detect deviations as...

1 months ago cs.CR cs.LG PDF

Attack MEDIUM

Do Not Leave a Gap: Hallucination-Free Object Concealment in Vision-Language Models

Amira Guesmi, Muhammad Shafique

Vision-language models (VLMs) have recently shown remarkable capabilities in visual understanding and generation, but remain vulnerable to...

1 months ago cs.CR cs.CV PDF

Attack MEDIUM

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

Ruyi Zhang, Heng Gao, Songlei Jian +2 more

Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models

Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy

Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains...

2 months ago cs.SE PDF

Attack MEDIUM

Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Jianwei Li, Jung-Eun Kim

Backdoor attacks pose severe security threats to large language models (LLMs), where a model behaves normally under benign inputs but produces...

2 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

Xiangkui Cao, Jie Zhang, Meina Kan +2 more

Large Vision-Language Models (LVLMs) have shown remarkable potential across a wide array of vision-language tasks, leading to their adoption in...

2 months ago cs.CV PDF

Attack MEDIUM

EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Haodong Zhao, Jinming Hu, Yijie Bai +6 more

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every...

2 months ago cs.CR PDF

Attack MEDIUM

MCP-in-SoS: Risk assessment framework for open-source MCP servers

Pratyay Kumar, Miguel Antonio Guirao Aguilera, Srikathyayani Srikanteswara +2 more

Model Context Protocol (MCP) servers have rapidly emerged over the past year as a widely adopted way to enable Large Language Model (LLM) agents to...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

CLIOPATRA: Extracting Private Information from LLM Insights

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz

As AI assistants become widely used, privacy-aware platforms like Anthropic's Clio have been introduced to generate insights from real-world AI use....

2 months ago cs.CR PDF

Attack MEDIUM

Compartmentalization-Aware Automated Program Repair

Jia Hu, Youcheng Sun, Pierre Olivier

Software compartmentalization breaks down an application into compartments isolated from each other: an attacker taking over a compartment will be...

2 months ago cs.CR PDF

Attack MEDIUM

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Ali Raza, Gurang Gupta, Nikolay Matyunin +1 more

Warning: This article includes red-teaming experiments, which contain examples of compromised LLM responses that may be offensive or upsetting. Large...

2 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

AgenticCyOps: Securing Multi-Agentic AI Integration in Enterprise Cyber Operations

Shaswata Mitra, Raj Patel, Sudip Mittal +2 more

Multi-agent systems (MAS) powered by LLMs promise adaptive, reasoning-driven enterprise workflows, yet granting agents autonomous control over tools,...

2 months ago cs.CR cs.MA cs.SE PDF

Attack MEDIUM

LLM-Agent Interactions on Markets with Information Asymmetries

Alexander Erlei, Lukas Meub

As AI agents increasingly act on behalf of human stakeholders in economic settings, understanding their behavior in complex market environments...

2 months ago econ.GN PDF

Attack MEDIUM

Detecting Cryptographically Relevant Software Packages with Collaborative LLMs

Eduard Hirsch, Kristina Raab, Tobias J. Bauer +1 more

IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities....

2 months ago cs.CR cs.IR PDF

Attack MEDIUM

SPOILER: TEE-Shielded DNN Partitioning of On-Device Secure Inference with Poison Learning

Donghwa Kang, Hojun Choe, Doohyun Kim +2 more

Deploying deep neural networks (DNNs) on edge devices exposes valuable intellectual property to model-stealing attacks. While TEE-shielded DNN...

2 months ago cs.CR PDF

Attack MEDIUM

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV...

2 months ago cs.CR cs.LG PDF

Attack MEDIUM

Efficient Refusal Ablation in LLM through Optimal Transport

Geraldin Nanfack, Eugene Belilovsky, Elvis Dohmatob

Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent...

2 months ago cs.LG cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial