AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 261–280 of 932 papers

Clear filters

Attack MEDIUM

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Saikat Maiti

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system...

1 months ago cs.CR cs.AI PDF

Survey MEDIUM

Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

Zichen Tang, Zirui Zhang, Qian Wang +3 more

Current Large Language Models (LLMs) are gradually exploited in practically valuable agentic workflows such as Deep Research, E-commerce...

1 months ago cs.CY cs.MA PDF

Survey MEDIUM

Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

Zichen Tang, Zirui Zhang, Qian Wang +3 more

Current Large Language Models (LLMs) are gradually exploited in practically valuable agentic workflows such as Deep Research, E-commerce...

1 months ago cs.CY cs.MA PDF

Survey MEDIUM

MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)

Yi Ting Shen, Kentaroh Toyoda, Alex Leung

The Model Context Protocol (MCP) introduces a structurally distinct attack surface that existing threat frameworks, designed for traditional software...

1 months ago cs.CR cs.AI PDF

Survey MEDIUM

Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs

Abhijeet Sahu, Shuva Paul, Richard Macwan

Cyber deception assists in increasing the attacker's budget in reconnaissance or any early phases of threat intrusions. In the past, numerous methods...

1 months ago cs.CR cs.ET PDF

Attack MEDIUM

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems

Patrick Levi

Retrieval augmented generation systems have become an integral part of everyday life. Whether in internet search engines, email systems, or service...

1 months ago cs.CR cs.AI PDF

Tool MEDIUM

Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework

Taiwo Onitiju, Iman Vakilinia

Large Language Models increasingly power critical infrastructure from healthcare to finance, yet their vulnerability to adversarial manipulation...

1 months ago cs.CR cs.AI PDF

Attack MEDIUM

An End-to-End Framework for Functionality-Embedded Provenance Graph Construction and Threat Interpretation

Kushankur Ghosh, Mehar Klair, Kian Kyars +2 more

Provenance graphs model causal system-level interactions from logs, enabling anomaly detectors to learn normal behavior and detect deviations as...

1 months ago cs.CR cs.LG PDF

Benchmark MEDIUM

Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

Caglar Yildirim

Large language models (LLMs) are increasingly deployed as tool-using agents, shifting safety concerns from harmful text generation to harmful task...

1 months ago cs.AI PDF

Benchmark MEDIUM

CoMAI: A Collaborative Multi-Agent Framework for Robust and Equitable Interview Evaluation

Gengxin Sun, Ruihao Yu, Liangyi Yin +3 more

Ensuring robust and fair interview assessment remains a key challenge in AI-driven evaluation. This paper presents CoMAI, a general-purpose...

1 months ago cs.MA cs.AI PDF

Tool MEDIUM

SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment

Zhouwei Zhai, Mengxiang Chen, Anmeng Zhang

Large language models offer transformative potential for e-commerce search by enabling intent-aware recommendations. However, their industrial...

1 months ago cs.CL PDF

Attack MEDIUM

Do Not Leave a Gap: Hallucination-Free Object Concealment in Vision-Language Models

Amira Guesmi, Muhammad Shafique

Vision-language models (VLMs) have recently shown remarkable capabilities in visual understanding and generation, but remain vulnerable to...

1 months ago cs.CR cs.CV PDF

Defense MEDIUM

Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory

Ce Zhang, Jinxi He, Junyi He +2 more

Multi-modal Large Language Models (MLLMs) have achieved remarkable performance across a wide range of visual reasoning tasks, yet their vulnerability...

1 months ago cs.CV cs.CL cs.CR PDF

Defense MEDIUM

Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph

Zhenheng Tang, Xiang Liu, Qian Wang +3 more

As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first...

1 months ago cs.AI cs.CY PDF

Defense MEDIUM

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury +4 more

Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the...

1 months ago cs.LG cs.AI cs.CL PDF

Survey MEDIUM

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

Kai Wang, Biaojie Zeng, Zeming Wei +7 more

With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel...

1 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration

Yu Pan, Wenlong Yu, Tiejun Wu +4 more

Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, they remain highly susceptible to...

1 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Directional Embedding Smoothing for Robust Vision Language Models

Ye Wang, Jing Liu, Toshiaki Koike-Akino

The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain...

1 months ago cs.LG cs.AI cs.CL PDF

Benchmark MEDIUM

SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing

Yuhuan Liu, Haitian Zhong, Xinyuan Xia +3 more

Large Language Models (LLMs) often suffer from catastrophic forgetting and collapse during sequential knowledge editing. This vulnerability stems...

1 months ago cs.AI PDF

Benchmark MEDIUM

Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI

Jinhu Qi, Yifan Li, Minghao Zhao +4 more

As agentic AI systems move beyond static question answering into open-ended, tool-augmented, and multi-step real-world workflows, their increased...

1 months ago cs.CL cs.DB PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial