AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 2061–2080 of 2,583 papers

Attack HIGH

Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?

Berk Atil, Rebecca J. Passonneau, Fred Morstatter

Large language models (LLMs) undergo safety alignment after training and tuning, yet recent work shows that safety can be bypassed through jailbreak...

6 months ago cs.CL PDF

Attack MEDIUM

ShadowLogic: Backdoors in Any Whitebox LLM

Kasimir Schulz, Amelia Kawasaki, Leo Ring

Large language models (LLMs) are widely deployed across various applications, often with safeguards to prevent the generation of harmful or...

6 months ago cs.CR cs.AI PDF

Tool LOW

Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation

Dong Chen, Yanzhe Wei, Zonglin He +7 more

Large language models (LLMs) offer transformative potential for clinical decision support in spine surgery but pose significant risks through...

6 months ago cs.LG cs.AI cs.CY PDF

Attack HIGH

Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack

Peng Ding, Jun Kuang, Wen Sun +5 more

Large language models (LLMs) remain vulnerable to jailbreaking attacks despite their impressive capabilities. Investigating these weaknesses is...

6 months ago cs.CL PDF

Attack HIGH

Red-teaming Activation Probes using Prompted LLMs

Phil Blandfort, Robert Graham

Activation probes are attractive monitors for AI systems due to low cost and latency, but their real-world robustness remains underexplored. We ask:...

6 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models

Ariyan Hossain, Khondokar Mohammad Ahanaf Hannan, Rakinul Haque +4 more

Gender bias in language models has gained increasing attention in the field of natural language processing. Encoder-based transformer models, which...

6 months ago cs.CL PDF

Defense MEDIUM

Reimagining Safety Alignment with An Image

Yifan Xia, Guorui Chen, Wenqian Yu +3 more

Large language models (LLMs) excel in diverse applications but face dual challenges: generating harmful content under jailbreak attacks and...

6 months ago cs.AI cs.CR PDF

Defense MEDIUM

Proactive DDoS Detection and Mitigation in Decentralized Software-Defined Networking via Port-Level Monitoring and Zero-Training Large Language Models

Mohammed N. Swileh, Shengli Zhang

Centralized Software-Defined Networking (cSDN) offers flexible and programmable control of networks but suffers from scalability and reliability...

6 months ago cs.CR cs.AI PDF

Attack HIGH

DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion

Ruofan Liu, Yun Lin, Zhiyong Huang +1 more

Large language models (LLMs) are increasingly integrated into IT infrastructures, where they process user data according to predefined instructions....

6 months ago cs.CR cs.AI PDF

Attack HIGH

ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

Xin Yao, Haiyang Zhao, Yimin Chen +3 more

The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from...

6 months ago cs.CV cs.CR cs.LG PDF

Attack HIGH

Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks

Kayua Oleques Paim, Rodrigo Brandao Mansilha, Diego Kreutz +2 more

The rapid proliferation of Large Language Models (LLMs) has raised significant concerns about their security against adversarial attacks. In this...

6 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Diffusion LLMs are Natural Adversaries for any LLM

David Lüdke, Tom Wollschläger, Paul Ungermann +2 more

We introduce a novel framework that transforms the resource-intensive (adversarial) prompt optimization problem into an \emph{efficient, amortized...

6 months ago cs.LG stat.ML PDF

Other LOW

Simulating Misinformation Vulnerabilities With Agent Personas

David Farr, Lynnette Hui Xian Ng, Stephen Prochaska +2 more

Disinformation campaigns can distort public perception and destabilize institutions. Understanding how different populations respond to information...

6 months ago cs.SI cs.AI cs.CL PDF

Defense HIGH

On the Difficulty of Selecting Few-Shot Examples for Effective LLM-based Vulnerability Detection

Md Abdul Hannan, Ronghao Ni, Chi Zhang +3 more

Large language models (LLMs) have demonstrated impressive capabilities across a wide range of coding tasks, including summarization, translation,...

6 months ago cs.SE cs.CR cs.LG PDF

Survey MEDIUM

Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Kathrin Grosse, Nico Ebert

Recent improvement gains in large language models (LLMs) have lead to everyday usage of AI-based Conversational Agents (CAs). At the same time, LLMs...

6 months ago cs.CR PDF

Attack MEDIUM

Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels

Chenghao Du, Quanfeng Huang, Tingxuan Tang +3 more

Large Language Models (LLMs) have transformed software development, enabling AI-powered applications known as LLM-based agents that promise to...

6 months ago cs.CR PDF

Benchmark MEDIUM

Self-HarmLLM: Can Large Language Model Harm Itself?

Heehwan Kim, Sungjune Park, Daeseon Choi

Large Language Models (LLMs) are generally equipped with guardrails to block the generation of harmful responses. However, existing defenses always...

6 months ago cs.CL cs.AI PDF

Benchmark MEDIUM

Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation

Arnabh Borah, Md Tanvirul Alam, Nidhi Rastogi

Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often...

6 months ago cs.CR cs.AI PDF

Attack HIGH

Consistency Training Helps Stop Sycophancy and Jailbreaks

Alex Irpan, Alexander Matt Turner, Mark Kurzeja +2 more

An LLM's factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy...

6 months ago cs.LG cs.AI PDF

Benchmark MEDIUM

Reasoning Up the Instruction Ladder for Controllable Language Models

Zishuo Zheng, Vidhisha Balachandran, Chan Young Park +2 more

As large language model (LLM) based systems take on high-stakes roles in real-world decision-making, they must reconcile competing instructions from...

6 months ago cs.CL cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial