AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 321–340 of 725 papers

Clear filters

Attack HIGH

ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models

Mitchell Piehl, Zhaohan Xi, Zuobin Xiong +2 more

Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent...

2 months ago cs.LG PDF

Attack MEDIUM

Closing the Distribution Gap in Adversarial Training for LLMs

Chengzhi Hu, Jonas Dornbusch, David Lüdke +2 more

Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant...

2 months ago cs.LG cs.AI cs.CR PDF

Attack HIGH

Boundary Point Jailbreaking of Black-Box LLMs

Xander Davies, Giorgi Giglemiani, Edmund Lau +3 more

Frontier LLMs are safeguarded against attempts to extract harmful information via adversarial prompts known as "jailbreaks". Recently, defenders have...

2 months ago cs.LG PDF

Attack MEDIUM

Overthinking Loops in Agents: A Structural Risk via MCP Tools

Yohan Lee, Jisoo Jang, Seoyeon Choi +2 more

Tool-using LLM agents increasingly coordinate real workloads by selecting and chaining third-party tools based on text-visible metadata such as tool...

2 months ago cs.CL cs.CR PDF

Attack HIGH

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

Lukas Struppek, Adam Gleave, Kellin Pelrine

As the capabilities of large language models continue to advance, so does their potential for misuse. While closed-source models typically rely on...

2 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

In Chong Choi, Jiacheng Zhang, Feng Liu +1 more

Multi-turn jailbreak attacks are effective against text-only large language models (LLMs) by gradually introducing malicious content across turns....

2 months ago cs.CV PDF

Attack MEDIUM

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents

Zhenhong Zhou, Yuanhe Zhang, Hongwei Cai +6 more

The Model Context Protocol (MCP) standardizes tool use for LLM-based agents and enable third-party servers. This openness introduces a security...

2 months ago cs.CR cs.CL PDF

Attack HIGH

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

Xiaojun Jia, Jie Liao, Simeng Qin +5 more

Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

Mario Marín Caballero, Miguel Betancourt Alonso, Daniel Díaz-López +3 more

The most valuable asset of any cloud-based organization is data, which is increasingly exposed to sophisticated cyberattacks. Until recently, the...

2 months ago cs.CR cs.AI PDF

Attack LOW

Tutoring Large Language Models to be Domain-adaptive, Precise, and Safe

Somnath Banerjee

The overarching research direction of this work is the development of a ''Responsible Intelligence'' framework designed to reconcile the immense...

2 months ago cs.CL PDF

Attack HIGH

AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks

Yuqi Jia, Ruiqi Wang, Xilong Wang +2 more

Prompt injection attacks insert malicious instructions into an LLM's input to steer it toward an attacker-chosen task instead of the intended one....

2 months ago cs.CR PDF

Attack HIGH

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Ruomeng Ding, Yifei Pang, He Sun +3 more

Evaluation and alignment pipelines for large language models increasingly rely on LLM-based judges, whose behavior is guided by natural-language...

2 months ago cs.CR cs.AI cs.CL PDF

Attack HIGH

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Weiming Song, Xuan Xie, Ruiping Yin

Large language models (LLMs) remain vulnerable to jailbreak prompts that elicit harmful or policy-violating outputs, while many existing defenses...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Akshat Naik, Jay Culligan, Yarin Gal +4 more

As Large Language Model (LLM) agents become more capable, their coordinated use in the form of multi-agent systems is anticipated to emerge as a...

2 months ago cs.AI PDF

Attack MEDIUM

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

Yiran Gao, Kim Hammar, Tao Li

Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Backdoor Attacks on Contrastive Continual Learning for IoT Systems

Alfous Tim, Kuniyilh Simi D

The Internet of Things (IoT) systems increasingly depend on continual learning to adapt to non-stationary environments. These environments can...

2 months ago cs.LG cs.CR cs.NI PDF

Attack HIGH

Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks

Osama Zafar, Shaojie Zhan, Tianxi Ji +1 more

In recent years, the widespread adoption of Machine Learning as a Service (MLaaS), particularly in sensitive environments, has raised considerable...

2 months ago cs.CR PDF

Attack MEDIUM

TensorCommitments: A Lightweight Verifiable Inference for Language Models

Oguzhan Baser, Elahe Sadeghi, Eric Wang +5 more

Most large language models (LLMs) run on external clouds: users send a prompt, pay for inference, and must trust that the remote GPU executes the LLM...

2 months ago cs.CR cs.AI PDF

Attack HIGH

Sparse Autoencoders are Capable LLM Jailbreak Mitigators

Yannick Assogba, Jacopo Cortellazzi, Javier Abad +3 more

Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based...

2 months ago cs.CR cs.CL cs.LG PDF

Attack HIGH

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Dong Yan, Jian Liang, Ran He +1 more

Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text...

3 months ago cs.CR cs.AI cs.CL PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial