AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 341–360 of 1,933 papers

Clear filters

Attack MEDIUM

Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

Nicolás E. Díaz Ferreyra, Monika Swetha Gurupathi, Zadia Codabux +2 more

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software...

1 months ago cs.SE cs.CR cs.HC PDF

Survey HIGH

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Yuming Xu, Mingtao Zhang, Zhuohan Ge +5 more

Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external...

1 months ago cs.CR cs.AI PDF

Defense MEDIUM

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

Weiwei Qi, Zefeng Wu, Tianhang Zheng +4 more

Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of...

1 months ago cs.CR PDF

Attack MEDIUM

TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems

Labani Halder, Payel Sadhukhan, Sarbani Palit

Ensuring reliability in adversarial settings necessitates treating privacy as a foundational component of data-driven systems. While differential...

1 months ago cs.CR cs.AI cs.LG PDF

Defense LOW

AFGNN: API Misuse Detection using Graph Neural Networks and Clustering

Ponnampalam Pirapuraj, Tamal Mondal, Sharanya Gupta +3 more

Application Programming Interfaces (APIs) are crucial to software development, enabling integration of existing systems with new applications by...

1 months ago cs.SE PDF

Attack HIGH

Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation

Wenpeng Xing, Moran Fang, Guangtai Wang +2 more

While Large Language Models (LLMs) have achieved remarkable performance, they remain vulnerable to jailbreak attacks that circumvent safety...

1 months ago cs.AI PDF

Attack HIGH

Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

Wenkui Yang, Chao Jin, Haisu Zhu +7 more

Existing red-teaming studies on GUI agents have important limitations. Adversarial perturbations typically require white-box access, which is...

1 months ago cs.CR cs.CL cs.CV PDF

Defense MEDIUM

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen +6 more

The deployment of large language models (LLMs) raises significant ethical and safety concerns. While LLM alignment techniques are adopted to improve...

1 months ago cs.CR cs.CL PDF

Attack HIGH

TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense

Cheng Liu, Xiaolei Liu, Xingyu Li +2 more

Existing jailbreak defense paradigms primarily rely on static detection of prompts, outputs, or internal states, often neglecting the dynamic...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

ADAG: Automatically Describing Attribution Graphs

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt +1 more

In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular...

1 months ago cs.CL PDF

Survey MEDIUM

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan +2 more

The Model Context Protocol (MCP) enables large language models (LLMs) to dynamically discover and invoke third-party tools, significantly expanding...

1 months ago cs.CR cs.AI PDF

Tool MEDIUM

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia +1 more

Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration...

1 months ago cs.CR PDF

Benchmark MEDIUM

ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Yu Liang, Liangxin Liu, Longzheng Wang +5 more

Generative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering...

1 months ago cs.AI cs.CL cs.LG PDF

Benchmark LOW

M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery

Hanyi Liu, Zhonghao Jiu, Minghao Wang +2 more

Implicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem:...

1 months ago cs.AI PDF

Benchmark MEDIUM

Validated Intent Compilation for Constrained Routing in LEO Mega-Constellations

Yuanhang Li

Operating LEO mega-constellations requires translating high-level operator intents ("reroute financial traffic away from polar links under 80 ms")...

1 months ago cs.CR cs.AI PDF

Tool MEDIUM

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang +1 more

As large language models (LLMs) evolve from static chatbots into autonomous agents, the primary vulnerability surface shifts from final outputs to...

1 months ago cs.CR cs.AI cs.CL PDF

Other HIGH

VulGD: A LLM-Powered Dynamic Open-Access Vulnerability Graph Database

Luat Do, Jiao Yin, Jinli Cao +1 more

Software vulnerabilities continue to pose significant threats to modern information systems, requiring a timely and accurate risk assessment. Public...

1 months ago cs.CR cs.DB PDF

Attack HIGH

Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

Zhiheng Li, Zongyang Ma, Yuntong Pan +8 more

Multimodal Large Language Models (MLLMs) are increasingly being deployed as automated content moderators. Within this landscape, we uncover a...

1 months ago cs.CV PDF

Attack HIGH

Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

Zhiheng Li, Zongyang Ma, Yuntong Pan +8 more

Multimodal Large Language Models (MLLMs) are increasingly being deployed as automated content moderators. Within this landscape, we uncover a...

1 months ago cs.CV PDF

Attack MEDIUM

Evaluating PQC KEMs, Combiners, and Cascade Encryption via Adaptive IND-CPA Testing Using Deep Learning

Simon Calderon, Niklas Johansson, Onur Günlü

Ensuring ciphertext indistinguishability is fundamental to cryptographic security, but empirically validating this property in real implementations...

1 months ago cs.CR cs.IT cs.LG PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial