AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 121–140 of 969 papers

Clear filters

Attack HIGH

MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning

Yizhe Zeng, Wei Zhang, Yunpeng Li +3 more

While Chain-of-Thought (CoT) prompting has become a standard paradigm for eliciting complex reasoning capabilities in Large Language Models, it...

1 months ago cs.CR PDF

Attack HIGH

Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

Adrian Shuai Li, Md Ajwad Akil, Elisa Bertino

Concept drift and adversarial evasion are two major challenges for deploying machine learning-based malware detectors. While both have been studied...

1 months ago cs.CR PDF

Attack HIGH

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala +4 more

We prove that no continuous, utility-preserving wrapper defense-a function $D: X\to X$ that preprocesses inputs before the model sees them-can make...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

Exclusive Unlearning

Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao +2 more

When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content...

1 months ago cs.CL PDF

Attack LOW

Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives

Changgeon Ko, Jisu Shin, Hoyun Song +3 more

Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates...

1 months ago cs.CL cs.AI cs.MA PDF

Attack MEDIUM

Adversarial Robustness of Time-Series Classification for Crystal Collimator Alignment

Xaver Fink, Borja Fernandez Adiego, Daniele Mirarchi +4 more

In this paper, we analyze and improve the adversarial robustness of a convolutional neural network (CNN) that assists crystal-collimator alignment at...

1 months ago cs.CR cs.LG PDF

Attack HIGH

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Zonghao Ying, Haowen Dai, Lianyu Hu +5 more

Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and...

1 months ago cs.CV PDF

Attack HIGH

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Zonghao Ying, Haowen Dai, Lianyu Hu +5 more

Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and...

1 months ago cs.CV PDF

Attack HIGH

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang, Chaojian Yu, Ziming Hong +4 more

Multimodal pretrained models are vulnerable to backdoor attacks, yet most existing methods rely on visual or multimodal triggers, which are...

1 months ago cs.CR cs.LG PDF

Attack HIGH

Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling

Qingyang Xu, Yaling Shen, Stephanie Fong +7 more

The increasing use of large language models (LLMs) in mental healthcare raises safety concerns in high-stakes therapeutic interactions. A key...

1 months ago cs.CL PDF

Attack MEDIUM

Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

Vinod Vaikuntanathan, Or Zamir

AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by...

1 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

Semantics Over Syntax: Uncovering Pre-Authentication 5G Baseband Vulnerabilities

Qiqing Huang, Xingyu Wang, Wanda Guo +2 more

Modern 5G user equipment (UE) processes Radio Resource Control (RRC) configuration messages during early control-plane exchanges, before...

1 months ago cs.CR PDF

Attack MEDIUM

Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning

Aobo Chen, Chenxu Zhao, Chenglin Miao +1 more

Large language models (LLMs) possess strong semantic understanding, driving significant progress in data mining applications. This is further...

1 months ago cs.LG cs.CR PDF

Attack HIGH

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin +6 more

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection

Vickson Ferrel

As TLS 1.3 encryption limits traditional Deep Packet Inspection (DPI), the security community has pivoted to Euclidean Transformer-based classifiers...

1 months ago cs.CR cs.LG PDF

Attack HIGH

SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits

Zikai Zhang, Rui Hu, Olivera Kotevska +1 more

Large Language Models (LLMs) are powerful tools for answering user queries, yet they remain highly vulnerable to jailbreak attacks. Existing...

1 months ago cs.CR cs.AI PDF

Attack HIGH

No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents

Tiankai Yang, Jiate Li, Yi Nian +5 more

LLM-based agents increasingly operate across repeated sessions, maintaining task states to ensure continuity. In many deployments, a single agent...

1 months ago cs.CL cs.AI cs.CR PDF

Attack HIGH

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng +1 more

Large language models (LLMs) and their applications, such as agents, are highly vulnerable to prompt injection attacks. State-of-the-art prompt...

1 months ago cs.CR PDF

Attack HIGH

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou +3 more

Model merging has emerged as a powerful technique for combining specialized capabilities from multiple fine-tuned LLMs without additional training...

1 months ago cs.CR PDF

Attack MEDIUM

Performative Scenario Optimization

Quanyan Zhu, Zhengye Han

This paper introduces a performative scenario optimization framework for decision-dependent chance-constrained problems. Unlike classical stochastic...

1 months ago cs.GT PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial