AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 1601–1620 of 1,930 papers

Clear filters

Tool HIGH

Breaking Minds, Breaking Systems: Jailbreaking Large Language Models via Human-like Psychological Manipulation

Zehao Liu, Xi Lin

Large Language Models (LLMs) have gained considerable popularity and protected by increasingly sophisticated safety mechanisms. However, jailbreak...

4 months ago cs.CR cs.AI PDF

Defense LOW

Emergent Learner Agency in Implicit Human-AI Collaboration: How AI Personas Reshape Creative-Regulatory Interaction

Yueqiao Jin, Roberto Martinez-Maldonado, Dragan Gašević +1 more

Generative AI is increasingly embedded in collaborative learning, yet little is known about how AI personas shape learner agency when AI teammates...

4 months ago cs.HC PDF

Benchmark MEDIUM

Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models

Wei Qian, Chenxu Zhao, Yangyi Li +1 more

The rapid advancements in artificial intelligence (AI) have primarily focused on the process of learning from data to acquire knowledgeable learning...

4 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

Attention Distance: A Novel Metric for Directed Fuzzing with Large Language Models

Wang Bin, Ao Yang, Kedan Li +5 more

In the domain of software security testing, Directed Grey-Box Fuzzing (DGF) has garnered widespread attention for its efficient target localization...

4 months ago cs.SE cs.AI PDF

Attack MEDIUM

AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens

Tung-Ling Li, Yuhao Wu, Hongliang Liu

Reward models and LLM-as-a-Judge systems are central to modern post-training pipelines such as RLHF, DPO, and RLAIF, where they provide scalar...

4 months ago cs.LG cs.CL cs.CR PDF

Attack MEDIUM

Adversarially Robust Detection of Harmful Online Content: A Computational Design Science Approach

Yidong Chai, Yi Liu, Mohammadreza Ebrahimi +2 more

Social media platforms are plagued by harmful content such as hate speech, misinformation, and extremist rhetoric. Machine learning (ML) models are...

4 months ago cs.LG PDF

Tool MEDIUM

Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems

Abhivansh Gupta

As LLM-based agents grow more autonomous and multi-modal, ensuring they remain controllable, auditable, and faithful to deployer intent becomes...

4 months ago cs.MA cs.AI cs.LG PDF

Benchmark MEDIUM

Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning

Baolei Zhang, Minghong Fang, Zhuqing Liu +5 more

Federated Learning (FL) allows multiple clients to collaboratively train a model without sharing their private data. However, FL is vulnerable to...

4 months ago cs.CR cs.DC cs.LG PDF

Attack HIGH

Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors

Huixin Zhan

Genomic Foundation Models (GFMs), such as Evolutionary Scale Modeling (ESM), have demonstrated remarkable success in variant effect prediction....

4 months ago cs.CR cs.LG q-bio.QM PDF

Attack LOW

Pixel Seal: Adversarial-only training for invisible image and video watermarking

Tomáš Souček, Pierre Fernandez, Hady Elsahar +5 more

Invisible watermarking is essential for tracing the provenance of digital content. However, training state-of-the-art models remains notoriously...

4 months ago cs.CV cs.AI cs.CR PDF

Defense LOW

Distributional AGI Safety

Nenad Tomašev, Matija Franklin, Julian Jacobs +2 more

AI safety and alignment research has predominantly been focused on methods for safeguarding individual AI systems, resting on the assumption of an...

4 months ago cs.AI PDF

Attack HIGH

Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models

Kai Hu, Abhinav Aggarwal, Mehran Khodabandeh +6 more

This paper introduces Jailbreak-Zero, a novel red teaming methodology that shifts the paradigm of Large Language Model (LLM) safety evaluation from a...

4 months ago cs.CL cs.CR cs.LG PDF

Tool HIGH

A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection

Xiao Li, Yue Li, Hao Wu +4 more

As large language models (LLMs) are increasingly adopted for code vulnerability detection, their reliability and robustness across diverse...

4 months ago cs.CR cs.LG PDF

Defense LOW

From Personalization to Prejudice: Bias and Discrimination in Memory-Enhanced AI Agents for Recruitment

Himanshu Gharat, Himanshi Agrawal, Gourab K. Patro

Large Language Models (LLMs) have empowered AI agents with advanced capabilities for understanding, reasoning, and interacting across diverse tasks....

4 months ago cs.AI cs.IR PDF

Defense MEDIUM

From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection

Hao Li, Yubing Ren, Yanan Cao +3 more

Benefiting from the superior capabilities of large language models in natural language understanding and generation, Embeddings-as-a-Service (EaaS)...

4 months ago cs.CR cs.CL PDF

Benchmark HIGH

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid +1 more

In this fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small...

4 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval

Saksham Sahai Srivastava, Haoyu He

Large Language Model (LLM) agents increasingly rely on long-term memory and Retrieval-Augmented Generation (RAG) to persist experiences and refine...

4 months ago cs.CR cs.AI cs.LG PDF

Attack MEDIUM

In-Context Probing for Membership Inference in Fine-Tuned Language Models

Zhexi Lu, Hongliang Chi, Nathalie Baracaldo +3 more

Membership inference attacks (MIAs) pose a critical privacy threat to fine-tuned large language models (LLMs), especially when models are adapted to...

4 months ago cs.CR cs.LG PDF

Attack HIGH

DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack

Hao Li, Yubing Ren, Yanan Cao +4 more

With the rapid development of cloud-based services, large language models (LLMs) have become increasingly accessible through various web platforms....

4 months ago cs.CR cs.CL PDF

Defense LOW

PediatricAnxietyBench: Evaluating Large Language Model Safety Under Parental Anxiety and Pressure in Pediatric Consultations

Vahideh Zolfaghari

Large language models (LLMs) are increasingly consulted by parents for pediatric guidance, yet their safety under real-world adversarial pressures is...

4 months ago cs.AI PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial