AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1–20 of 232 papers

Clear filters

Attack MEDIUM

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright +3 more

We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use...

2 days ago cs.CR cs.AI PDF

Attack MEDIUM

CALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoring

Li Lixing

Modern large language models (LLMs) rely on system prompts to establish behavioral constraints and safety rules. Standard causal self-attention...

2 days ago cs.LG PDF

Attack MEDIUM

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Isaac David, Arthur Gervais

Security updates create a short but important window in which defenders and attackers can compare vulnerable and patched software. Yet in many...

5 days ago cs.CR cs.AI PDF

Attack MEDIUM

Architecture Matters: Comparing RAG Systems under Knowledge Base Poisoning

Samuel Korn

Retrieval-Augmented Generation (RAG) systems are vulnerable to knowledge base poisoning, yet existing attacks have been evaluated almost exclusively...

5 days ago cs.CR cs.CL cs.LG PDF

Attack MEDIUM

Information Theoretic Adversarial Training of Large Language Models

Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi +3 more

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors...

6 days ago cs.LG cs.AI cs.CR PDF

Attack MEDIUM

On the Hardness of Junking LLMs

Marco Rando, Samuel Vaiter

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit...

6 days ago cs.LG PDF

Attack MEDIUM

Gray-Box Poisoning of Continuous Malware Ingestion Pipelines

Jan Dolejš, Martin Jureček, Róbert Lórencz

Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work...

6 days ago cs.CR cs.LG PDF

Attack MEDIUM

Laundering AI Authority with Adversarial Examples

Jie Zhang, Pura Peetathawatchai, Florian Tramèr +1 more

Vision-language models (VLMs) are increasingly deployed as trusted authorities -- fact-checking images on social media, comparing products, and...

1 weeks ago cs.CR cs.LG PDF

Attack MEDIUM

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo +3 more

We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including...

1 weeks ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea, Juan Tapiador

Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based...

1 weeks ago cs.CR PDF

Attack MEDIUM

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea, Juan Tapiador

Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based...

1 weeks ago cs.CR PDF

Attack MEDIUM

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

Ishrith Gowda

Persistent external memory enables LLM agents to maintain context across sessions, yet its security properties remain formally uncharacterized. We...

1 weeks ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Dependency-Aware Privacy for Multi-turn Agents

Divyam Anshumaan, Sarthak Choudhary, Nils Palumbo +1 more

LLM agents release private data across multi-service interactions. Existing prompt sanitizers based on metric differential privacy treat each release...

1 weeks ago cs.CR PDF

Attack MEDIUM

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

Mingshuo Liu, Yiwei Zha, Min Chen

Browsing-enabled LLM assistants can fetch webpages and answer contact-seeking queries, creating a practical channel for scraping contact-style...

1 weeks ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

Tool Use as Action: Towards Agentic Control in Mobile Core Networks

Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi +1 more

Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes...

1 weeks ago cs.NI eess.SY PDF

Attack MEDIUM

Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training

Wenjing Duan, Qi Zhou, Yuanfan Li

Machine-generated text (MGT) detection is critical for regulating online information ecosystems, yet existing detectors often underperform in...

1 weeks ago cs.CR cs.CL PDF

Attack MEDIUM

Adversarial Update-Based Federated Unlearning for Poisoned Model Recovery

Wenwei Zhao, Xiaowen Li, Yao Liu +1 more

Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the...

1 weeks ago cs.LG cs.CR PDF

Attack MEDIUM

Disentangling Intent from Role: Adversarial Self-Play for Persona-Invariant Safety Alignment

Jiajia Li, Xiaoyu Wen, Zhongtian Ma +3 more

The growing capabilities of large language models (LLMs) have driven their widespread deployment across diverse domains, even in potentially...

1 weeks ago cs.AI PDF

Attack MEDIUM

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

George Fatouros, Georgios Makridis, John Soldatos +18 more

European financial institutions face mounting regulatory pressure while their security operations centres remain constrained not by data or staffing...

1 weeks ago cs.AI cs.CR cs.IR PDF

Attack MEDIUM

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček +2 more

Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However,...

1 weeks ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial