AI Security Research

2,560+ academic papers on AI security, attacks, and defenses

Total

2,560

Attack

982

Benchmark

736

Defense

350

Tool

275

Survey

144

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 381–400 of 1,220 papers

Clear filters

Benchmark MEDIUM

Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification

David Condrey

The proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are...

2 months ago cs.CR cs.HC cs.LG PDF

Attack MEDIUM

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Idan Habler, Vineeth Sai Narajala, Stav Koren +2 more

Retrieval-Augmented Generation (RAG) systems are essential to contemporary AI applications, allowing large language models to obtain external...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

Training Agents to Self-Report Misbehavior

Bruce W. Lee, Chen Yueh-Han, Tomek Korbak

Frontier AI agents may pursue hidden goals while concealing their pursuit from oversight. Alignment training aims to prevent such behavior by...

2 months ago cs.LG cs.AI PDF

Defense MEDIUM

Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions

Lan Zhang, Chengsi Liang, Zeming Zhuang +4 more

Semantic communication (SemCom) redefines wireless communication from reproducing symbols to transmitting task-relevant semantics. However, this...

2 months ago cs.CR eess.SY PDF

Attack MEDIUM

Manifold of Failure: Behavioral Attraction Basins in Language Models

Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala +4 more

While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a...

2 months ago cs.LG cs.AI cs.CR PDF

Tool MEDIUM

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

Kimberly T. Mai, Anna Gausen, Magda Dubois +5 more

AI is increasingly being used to assist fraud and cybercrime. However, it is unclear the extent to which current large language models can provide...

2 months ago cs.CY PDF

Attack MEDIUM

Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy +8 more

Current stateless defences for multimodal agentic RAG fail to detect adversarial strategies that distribute malicious semantics across retrieval,...

2 months ago cs.CR cs.AI cs.CL PDF

Benchmark MEDIUM

Detoxifying LLMs via Representation Erasure-Based Preference Optimization

Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle +3 more

Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based on...

2 months ago cs.LG PDF

Defense MEDIUM

MemoPhishAgent: Memory-Augmented Multi-Modal LLM Agent for Phishing URL Detection

Xuan Chen, Hao Liu, Tao Yuan +3 more

Traditional phishing website detection relies on static heuristics or reference lists, which lag behind rapidly evolving attacks. While recent...

2 months ago cs.CR PDF

Defense MEDIUM

Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment

Mengxuan Hu, Vivek V. Datla, Anoop Kumar +4 more

Recent advances in alignment techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct...

2 months ago cs.CL cs.AI PDF

Defense MEDIUM

A Lightweight Defense Mechanism against Next Generation of Phishing Emails using Distilled Attention-Augmented BiLSTM

Morteza Eskandarian, Mahdi Rabbani, Arun Kaniyamattam +6 more

The current generation of large language models produces sophisticated social-engineering content that bypasses standard text screening systems in...

2 months ago cs.CR PDF

Benchmark MEDIUM

Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models

Guangnian Wan, Qi Li, Gongfan Fang +2 more

Multimodal Diffusion Language Models (MDLMs) have recently emerged as a competitive alternative to their autoregressive counterparts. Yet their...

2 months ago cs.CR cs.LG PDF

Survey MEDIUM

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng +4 more

Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably....

2 months ago cs.CR cs.AI cs.CE PDF

Benchmark MEDIUM

OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

Longxiang Wang, Xiang Zheng, Xuhao Zhang +3 more

Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities...

2 months ago cs.CR cs.AI PDF

Attack MEDIUM

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Zac Garby, Andrew D. Gordon, David Sands

A conversation with a large language model (LLM) is a sequence of prompts and responses, with each response generated from the preceding...

2 months ago cs.PL cs.AI cs.CR PDF

Attack MEDIUM

Agents of Chaos

Natalie Shapira, Chris Wendler, Avery Yen +35 more

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent...

2 months ago cs.AI cs.CY PDF

Attack MEDIUM

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

Xunzhuo Liu, Huamin Chen, Samzong Lu +27 more

As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing -- selecting...

2 months ago cs.NI cs.AI PDF

Tool MEDIUM

LLM-enabled Applications Require System-Level Threat Monitoring

Yedi Zhang, Haoyu Wang, Xianglin Yang +2 more

LLM-enabled applications are rapidly reshaping the software ecosystem by using large language models as core reasoning components for complex task...

2 months ago cs.CR cs.AI cs.SE PDF

Attack MEDIUM

Efficient Multi-Party Secure Comparison over Different Domains with Preprocessing Assistance

Kaiwen Wang, Xiaolin Chang, Yuehan Dong +1 more

Secure comparison is a fundamental primitive in multi-party computation, supporting privacy-preserving applications such as machine learning and data...

2 months ago cs.CR PDF

Benchmark MEDIUM

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

Lei Ba, Qinbin Li, Songze Li

LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code...

2 months ago cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial