AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 481–500 of 726 papers

Clear filters

Attack HIGH

PINA: Prompt Injection Attack against Navigation Agents

Jiani Liu, Yixin He, Lanlan Fan +5 more

Navigation agents powered by large language models (LLMs) convert natural language instructions into executable plans and actions. Compared to...

3 months ago cs.CR PDF

Attack HIGH

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

Bingxin Xu, Yuzhang Shang, Binghui Wang +1 more

Vision-Language-Action (VLA) models are increasingly deployed in safety-critical robotic applications, yet their security vulnerabilities remain...

3 months ago cs.CR cs.AI cs.RO PDF

Attack HIGH

Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

Asen Dotsinski, Panagiotis Eustratiadis

As open-weight large language models (LLMs) increase in capabilities, safeguarding them against malicious prompts and understanding possible attack...

3 months ago cs.CL cs.CR cs.LG PDF

Attack HIGH

Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Diego Gosmar, Deborah A. Dahl

Prompt injection remains a central obstacle to the safe deployment of large language models, particularly in multi-agent settings where intermediate...

3 months ago cs.AI cs.MA PDF

Attack HIGH

CODE: A Contradiction-Based Deliberation Extension Framework for Overthinking Attacks on Retrieval-Augmented Generation

Xiaolei Zhang, Xiaojun Jia, Liquan Chen +1 more

Introducing reasoning models into Retrieval-Augmented Generation (RAG) systems enhances task performance through step-by-step reasoning, logical...

3 months ago cs.CR PDF

Attack MEDIUM

Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading

Advije Rizvani, Giovanni Apruzzese, Pavel Laskov

Large Language Models (LLMs) are increasingly adopted in the financial domain. Their exceptional capabilities to analyse textual data make them...

3 months ago cs.CR cs.LG PDF

Attack HIGH

ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation

Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta +1 more

Multimodal large language models (MLLMs) are increasingly used to automate chart generation from data tables, enabling efficient data analysis and...

3 months ago cs.CL PDF

Attack MEDIUM

On the Evidentiary Limits of Membership Inference for Copyright Auditing

Murat Bilgehan Ertan, Emirhan Böge, Min Chen +2 more

As large language models (LLMs) are trained on increasingly opaque corpora, membership inference attacks (MIAs) have been proposed to audit whether...

3 months ago cs.CR cs.AI PDF

Attack HIGH

TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning

Zhixin Xie, Xurui Song, Jun Luo

The demand of customized large language models (LLMs) has led to commercial LLMs offering black-box fine-tuning APIs, yet this convenience introduces...

3 months ago cs.CR cs.LG PDF

Attack HIGH

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Anirudh Sekar, Mrinal Agarwal, Rachel Sharma +4 more

Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such...

3 months ago cs.CR cs.CL PDF

Attack MEDIUM

Building Production-Ready Probes For Gemini

János Kramár, Joshua Engels, Zheng Wang +4 more

Frontier language model capabilities are improving rapidly. We thus need stronger mitigations against bad actors misusing increasingly powerful...

3 months ago cs.LG cs.AI cs.CL PDF

Attack MEDIUM

LoRA as Oracle

Marco Arazzi, Antonino Nocera

Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical...

3 months ago cs.CR cs.AI PDF

Attack HIGH

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

Aiman Al Masoud, Marco Arazzi, Antonino Nocera

Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language...

3 months ago cs.CR cs.AI PDF

Attack HIGH

AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou, Wang Yang

Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool...

3 months ago cs.CR cs.CL PDF

Attack HIGH

Serverless AI Security: Attack Surface Analysis and Runtime Protection Mechanisms for FaaS-Based Machine Learning

Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad +1 more

Serverless computing has achieved widespread adoption, with over 70% of AWS organizations using serverless solutions [1]. Meanwhile, machine learning...

3 months ago cs.CR cs.AI PDF

Attack HIGH

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

Yinzhi Zhao, Ming Wang, Shi Feng +3 more

Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world...

3 months ago cs.AI cs.CL PDF

Attack MEDIUM

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala +2 more

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We...

3 months ago cs.CL PDF

Attack MEDIUM

The Straight and Narrow: Do LLMs Possess an Internal Moral Path?

Luoming Hu, Jingjie Zeng, Liang Yang +1 more

Enhancing the moral alignment of Large Language Models (LLMs) is a critical challenge in AI safety. Current alignment techniques often act as...

3 months ago cs.CL PDF

Attack HIGH

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun

Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective...

3 months ago cs.CR PDF

Attack LOW

Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD

Murat Bilgehan Ertan, Marten van Dijk

Differentially Private Stochastic Gradient Descent (DP-SGD) is the dominant paradigm for private training, but its fundamental limitations under...

3 months ago cs.LG cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial