AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 641–660 of 982 papers

Clear filters

Attack HIGH

Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models

Fan Yang

Large Language Models (LLMs) have demonstrated exceptional performance across various tasks, but their security vulnerabilities can be exploited by...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

Jinbo Liu, Defu Cao, Yifei Wei +6 more

Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA...

5 months ago cs.CR cs.AI cs.CL PDF

Attack MEDIUM

In-Context Representation Hijacking

Itay Yona, Amir Sarid, Michael Karasik +1 more

We introduce $\textbf{Doublespeak}$, a simple in-context representation hijacking attack against large language models (LLMs). The attack works by...

5 months ago cs.CL cs.AI cs.CR PDF

Attack MEDIUM

SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

Hanxiu Zhang, Yue Zheng

The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While...

5 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

From static to adaptive: immune memory-based jailbreak detection for large language models

Jun Leng, Yu Liu, Litian Zhang +3 more

Large Language Models (LLMs) serve as the backbone of modern AI systems, yet they remain susceptible to adversarial jailbreak attacks. Consequently,...

5 months ago cs.CR PDF

Attack MEDIUM

Invasive Context Engineering to Control Large Language Models

Thomas Rivasseau

Current research on operator control of Large Language Models improves model robustness against adversarial attacks and misbehavior by training on...

5 months ago cs.AI PDF

Attack HIGH

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Yuan Xiong, Ziqi Miao, Lijun Li +3 more

While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to jailbreak attacks. Existing...

5 months ago cs.CV cs.CL cs.CR PDF

Attack HIGH

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

Afshin Khadangi, Hanna Marxen, Amir Sartipi +2 more

Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and...

5 months ago cs.CY cs.AI PDF

Attack HIGH

Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models

Ziyi Tong, Feifei Sun, Le Minh Nguyen

Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently,...

5 months ago cs.CR cs.AI PDF

Attack HIGH

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Yuanhe Zhang, Weiliu Wang, Zhenhong Zhou +5 more

Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in reasoning, planning, and tool usage. The recently proposed Model...

5 months ago cs.CR cs.CL PDF

Attack MEDIUM

Adversarial Robustness of Traffic Classification under Resource Constraints: Input Structure Matters

Adel Chehade, Edoardo Ragusa, Paolo Gastaldo +1 more

Traffic classification (TC) plays a critical role in cybersecurity, particularly in IoT and embedded contexts, where inspection must often occur...

5 months ago cs.NI cs.CR cs.LG PDF

Attack MEDIUM

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing

Zixia Wang, Gaojie Jin, Jia Hu +1 more

Recent advancements in Large Language Models (LLMs) have led to their widespread adoption in daily applications. Despite their impressive...

5 months ago cs.LG cs.AI PDF

Attack MEDIUM

From monoliths to modules: Decomposing transducers for efficient world modelling

Alexander Boyd, Franz Nowak, David Hyland +2 more

World models have been recently proposed as sandbox environments in which AI agents can be trained and evaluated before deployment. Although...

5 months ago cs.AI PDF

Attack MEDIUM

Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI

Aaron Sandoval, Cody Rushing

The field of AI Control seeks to develop robust control protocols, deployment safeguards for untrusted AI which may be intentionally subversive....

5 months ago cs.CR cs.CL PDF

Attack HIGH

Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks

Haowei Fu, Bo Ni, Han Xu +3 more

Retrieval-Augmented Generation (RAG) and Supervised Finetuning (SFT) have become the predominant paradigms for equipping Large Language Models (LLMs)...

5 months ago cs.CR cs.AI PDF

Attack MEDIUM

Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare

Adeela Bashir, The Anh han, Zia Ush Shamszaman

The integration of large language models (LLMs) into healthcare IoT systems promises faster decisions and improved medical support. LLMs are also...

5 months ago cs.CR cs.LG cs.MA PDF

Attack HIGH

Securing Large Language Models (LLMs) from Prompt Injection Attacks

Omar Farooq Khan Suri, John McCrae

Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection...

5 months ago cs.CR cs.CL cs.LG PDF

Attack HIGH

DefenSee: Dissecting Threat from Sight and Text -- A Multi-View Defensive Pipeline for Multi-modal Jailbreaks

Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing

Multi-modal large language models (MLLMs), capable of processing text, images, and audio, have been widely adopted in various AI applications....

5 months ago cs.CR PDF

Attack HIGH

Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis

Mintong Kang, Chong Xiang, Sanjay Kariyappa +3 more

Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical...

5 months ago cs.CR cs.LG PDF

Attack HIGH

Bias Injection Attacks on RAG Databases and Sanitization Defenses

Hao Wu, Prateek Saxena

This paper explores attacks and defenses on vector databases in retrieval-augmented generation (RAG) systems. Prior work on knowledge poisoning...

5 months ago cs.CR cs.AI cs.DB PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial