AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 41–60 of 625 papers

Clear filters

Attack HIGH

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu +4 more

LLM agents increasingly rely on skills to encapsulate reusable capabilities via progressively disclosed instructions. High-quality skills inject...

2 weeks ago cs.CR PDF

Attack HIGH

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun +3 more

The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo +6 more

Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges...

2 weeks ago cs.LG PDF

Attack HIGH

Adaptive Instruction Composition for Automated LLM Red-Teaming

Jesse Zymet, Andy Luo, Swapnil Shinde +2 more

Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis +2 more

The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the...

2 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Towards Certified Malware Detection: Provable Guarantees Against Evasion Attacks

Nandakrishna Giri, Asmitha K. A., Serena Nicolazzo +2 more

Machine learning-based static malware detectors remain vulnerable to adversarial evasion techniques, such as metamorphic engine mutations. To address...

2 weeks ago cs.CR cs.LG PDF

Attack HIGH

Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

Pranav Pallerla, Wilson Naik Bhukya, Bharath Vemula +1 more

Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private,...

2 weeks ago cs.CR cs.AI PDF

Attack HIGH

STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

MinJae Jung, YongTaek Lim, Chaeyun Kim +3 more

While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses....

3 weeks ago cs.CL PDF

Attack HIGH

An Empirical Study of Multi-Generation Sampling for Jailbreak Detection in Large Language Models

Hanrui Luo, Shreyank N Gowda

Detecting jailbreak behaviour in large language models remains challenging, particularly when strongly aligned models produce harmful outputs only...

3 weeks ago cs.CL cs.LG PDF

Attack HIGH

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj

Open-weight language models can be rendered unsafe through several distinct interventions, but the resulting models may differ substantially in...

3 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection

Thamilvendhan Munirathinam

Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer...

3 weeks ago cs.CR cs.CL PDF

Attack HIGH

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

Wentao Zhang, Yan Zhuang, ZhuHang Zheng +3 more

Existing jamming attacks on Retrieval-Augmented Generation (RAG) systems typically induce explicit refusals or denial-of-service behaviors, which are...

3 weeks ago cs.CR cs.AI PDF

Attack HIGH

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

Jin Zhao, Marta Knežević, Tanja Käser

Large Language Models (LLMs) are increasingly used in education, yet their default helpfulness often conflicts with pedagogical principles. Prior...

3 weeks ago cs.CR cs.AI PDF

Attack HIGH

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

Haochun Tang, Yuliang Yan, Jiahua Lu +2 more

Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the...

3 weeks ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu +2 more

Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however,...

3 weeks ago cs.CR cs.AI cs.SD PDF

Attack HIGH

Robustness Analysis of Machine Learning Models for IoT Intrusion Detection Under Data Poisoning Attacks

Fortunatus Aabangbio Wulnye, Justice Owusu Agyemang, Kwame Opuni-Boachie Obour Agyekum +3 more

Ensuring the reliability of machine learning-based intrusion detection systems remains a critical challenge in Internet of Things (IoT) environments,...

3 weeks ago cs.CR cs.AI PDF

Attack HIGH

Threat Modeling and Attack Surface Analysis of IoT-Enabled Controlled Environment Agriculture Systems

Andrii Vakhnovskyi

The United States designates Food and Agriculture as one of sixteen critical infrastructure sectors, yet no mandatory cybersecurity requirements...

4 weeks ago cs.CR eess.SY PDF

Attack HIGH

Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks

Yingying Zhao, Chengyin Hu, Qike Zhang +7 more

Vision-Language Models (VLMs) have shown remarkable performance, yet their security remains insufficiently understood. Existing adversarial studies...

4 weeks ago cs.CV PDF

Attack HIGH

Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs

Jianhao Chen, Haoyang Chen, Hanjie Zhao +2 more

The rapid evolution of Vision-Language Models (VLMs) has catalyzed unprecedented capabilities in artificial intelligence; however, this continuous...

4 weeks ago cs.AI cs.MM PDF

Attack HIGH

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

Junyu Ren, Xingjian Pan, Wensheng Gan +1 more

Prompt injection has emerged as a critical security threat to large language models (LLMs), yet existing studies predominantly focus on...

4 weeks ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial