AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1–20 of 22 papers

Clear filters

Survey HIGH

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Monika Jotautaitė, Maria Angelica Martinez, Ollie Matthews +1 more

We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may...

2 days ago cs.CR cs.AI PDF

Survey HIGH

SoK: Robustness in Large Language Models against Jailbreak Attacks

Feiyue Xu, Hongsheng Hu, Chaoxiang He +9 more

Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce...

6 days ago cs.CR cs.AI PDF

Survey HIGH

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu, Xiang Chen

Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems,...

1 weeks ago cs.CR cs.AI PDF

Survey HIGH

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Yuming Xu, Mingtao Zhang, Zhuohan Ge +5 more

Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external...

1 months ago cs.CR cs.AI PDF

Survey HIGH

Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

Charafeddine Mouzouni

LLM agents with tool access can discover and exploit security vulnerabilities. This is known. What is not known is which features of a system prompt...

1 months ago cs.CR cs.AI cs.CL PDF

Survey HIGH

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur

Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex...

1 months ago cs.CR cs.AI PDF

Survey HIGH

Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

Shouqiao Wang, Marcello Politi, Samuele Marro +1 more

As agentic systems move into real-world deployments, their decisions increasingly depend on external inputs such as retrieved content, tool outputs,...

1 months ago cs.AI PDF

Survey HIGH

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos +1 more

Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents...

1 months ago cs.SE cs.AI cs.CR PDF

Survey HIGH

Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

Fabrizio Dimino, Bhaskarjit Sarmah, Stefano Pasquali

The rapid adoption of large language models (LLMs) in financial services introduces new operational, regulatory, and security risks. Yet most...

2 months ago q-fin.CP cs.AI cs.CY PDF

Survey HIGH

A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications

Shruti Srivastava, Kiranmayee Janardhan, Shaurya Jauhari

Cybersecurity threats are becoming increasingly sophisticated, making traditional defense mechanisms and manual red teaming approaches insufficient...

2 months ago cs.CR cs.AI PDF

Survey HIGH

Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

Kunal Mukherjee

Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating system,...

2 months ago cs.CR cs.AI PDF

Survey HIGH

Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

Scott Thornton

AI-assisted code review is widely used to detect vulnerabilities before production release. Prior work shows that adversarial prompt manipulation can...

2 months ago cs.CR cs.AI cs.LG PDF

Survey HIGH

The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

Peiran Wang, Xinfeng Li, Chong Xiang +5 more

The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against...

3 months ago cs.CR cs.CL PDF

Survey HIGH

QRS: A Rule-Synthesizing Neuro-Symbolic Triad for Autonomous Vulnerability Discovery

George Tsigkourakos, Constantinos Patsakis

Static Application Security Testing (SAST) tools are integral to modern DevSecOps pipelines, yet tools like CodeQL, Semgrep, and SonarQube remain...

3 months ago cs.CR PDF

Survey HIGH

Semantics-Preserving Evasion of LLM Vulnerability Detectors

Luze Sun, Alina Oprea, Eric Wong

LLM-based vulnerability detectors are increasingly deployed in security-critical code review, yet their resilience to evasion under...

3 months ago cs.CR cs.AI cs.LG PDF

Survey HIGH

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

Pedro H. Barcha Correia, Ryan W. Achjian, Diego E. G. Caetano de Oliveira +5 more

The rapid advancement and widespread adoption of generative artificial intelligence (GenAI) and large language models (LLMs) has been accompanied by...

3 months ago cs.CR cs.AI cs.CL PDF

Survey HIGH

Paraphrasing Adversarial Attack on LLM-as-a-Reviewer

Masahiro Kaneko

The use of large language models (LLMs) in peer review systems has attracted growing attention, making it essential to examine their potential...

4 months ago cs.CL cs.AI cs.LG PDF

Survey HIGH

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai. -Doss

Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are...

4 months ago cs.CL cs.AI PDF

Survey HIGH

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo, Manish Prasad, Vasudev Majhi +5 more

Driven by surging submission volumes, scientific peer review has catalyzed two parallel trends: individual over-reliance on LLMs and institutional...

5 months ago cs.AI cs.CL cs.CR PDF

Survey HIGH

Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming

Strahinja Janjusevic, Anna Baron Garcia, Sohrob Kazerounian

Generative AI is reshaping offensive cybersecurity by enabling autonomous red team agents that can plan, execute, and adapt during penetration tests....

5 months ago cs.CR cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial