AI Security Research

2,589+ academic papers on AI security, attacks, and defenses

Total

2,589

Attack

998

Benchmark

740

Defense

355

Tool

276

Survey

147

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 961–980 of 1,931 papers

Clear filters

Attack HIGH

Sparse Autoencoders are Capable LLM Jailbreak Mitigators

Yannick Assogba, Jacopo Cortellazzi, Javier Abad +3 more

Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based...

3 months ago cs.CR cs.CL cs.LG PDF

Other HIGH

Abstractive Red-Teaming of Language Model Character

Nate Rahn, Allison Qi, Avery Griffin +3 more

We want language model assistants to conform to a character specification, which asserts how the model should act across diverse user interactions....

3 months ago cs.LG PDF

Tool HIGH

MalTool: Malicious Tool Attacks on LLM Agents

Yuepeng Hu, Yuqi Jia, Mengyuan Li +2 more

In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user installs the tool and the LLM agent selects...

3 months ago cs.CR PDF

Defense MEDIUM

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Zhaoxin Wang, Jiaming Liang, Fengbin Zhu +5 more

Large language models (LLMs) and multimodal LLMs are typically safety-aligned before release to prevent harmful content generation. However, recent...

3 months ago cs.LG PDF

Defense MEDIUM

Capability-Oriented Training Induced Alignment Risk

Yujun Zhou, Yue Huang, Han Bao +8 more

While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging:...

3 months ago cs.LG cs.CL PDF

Survey MEDIUM

Automatic Simplification of Common Vulnerabilities and Exposures Descriptions

Varpu Vehomäki, Kimmo K. Kaski

Understanding cyber security is increasingly important for individuals and organizations. However, a lot of information related to cyber security can...

3 months ago cs.CL PDF

Defense MEDIUM

LoRA-based Parameter-Efficient LLMs for Continuous Learning in Edge-based Malware Detection

Christian Rondanini, Barbara Carminati, Elena Ferrari +2 more

The proliferation of edge devices has created an urgent need for security solutions capable of detecting malware in real time while operating under...

3 months ago cs.CR cs.AI cs.DC PDF

Attack HIGH

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Dong Yan, Jian Liang, Ran He +1 more

Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text...

3 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems

Faouzi El Yagoubi, Ranwa Al Mallah, Godwin Badu-Marfo

Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks,...

3 months ago cs.AI PDF

Attack HIGH

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis

Jailbreaking large language models (LLMs) has emerged as a critical security challenge with the widespread deployment of conversational AI systems....

3 months ago cs.CR cs.CL PDF

Defense MEDIUM

Future Mining: Learning for Safety and Security

Md Sazedur Rahman, Mizanur Rahman Jewel, Sanjay Madria

Mining is rapidly evolving into an AI driven cyber physical ecosystem where safety and operational reliability depend on robust perception,...

3 months ago cs.CR cs.DC PDF

Benchmark MEDIUM

Optimizing Agent Planning for Security and Autonomy

Aashish Kolluri, Rishi Sharma, Manuel Costa +5 more

Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such...

3 months ago cs.CR cs.LG PDF

Benchmark LOW

Improving the Robustness of Large Language Models for Code Tasks via Fine-tuning with Perturbed Data

Yang Liu, Armstrong Foundjem, Xingfang Wu +2 more

Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as code...

3 months ago cs.SE PDF

Benchmark MEDIUM

The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods

Arpit Singh Gautam, Kailash Talreja, Saurabh Jha

Large Language Models (LLMs) frequently hallucinate plausible but incorrect assertions, a vulnerability often missed by uncertainty metrics when...

3 months ago cs.CL cs.AI PDF

Attack MEDIUM

Vulnerabilities in Partial TEE-Shielded LLM Inference with Precomputed Noise

Abhishek Saini, Haolin Jiang, Hang Liu

The deployment of large language models (LLMs) on third-party devices requires new ways to protect model intellectual property. While Trusted...

3 months ago cs.CR cs.AR PDF

Attack HIGH

Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

J Alex Corll

Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is...

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Zhenhua Zou, Sheng Guo, Qiuyang Zhan +6 more

The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current...

3 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

Xinguo Feng, Zhongkui Ma, Zihan Wang +2 more

Training and fine-tuning large-scale language models largely benefit from collaborative learning, but the approach has been proven vulnerable to...

3 months ago cs.CL PDF

Defense MEDIUM

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Adel ElZemity, Joshua Sylvester, Budi Arief +1 more

SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes...

3 months ago cs.CR PDF

Benchmark MEDIUM

@GrokSet: multi-party Human-LLM Interactions in Social Media

Matteo Migliarini, Berat Ercevik, Oluwagbemike Olowe +5 more

Large Language Models (LLMs) are increasingly deployed as active participants on public social media platforms, yet their behavior in these...

3 months ago cs.SI cs.CY PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial