AI Security Research

2,583+ academic papers on AI security, attacks, and defenses

Total

2,583

Attack

994

Benchmark

740

Defense

355

Tool

275

Survey

146

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 641–660 of 2,583 papers

Benchmark LOW

Continual Learning with Vision-Language Models via Semantic-Geometry Preservation

Chiyuan He, Zihuan Qiu, Fanman Meng +4 more

Continual learning of pretrained vision-language models (VLMs) is prone to catastrophic forgetting, yet current approaches adapt to new tasks without...

2 months ago cs.CV cs.LG PDF

Benchmark LOW

Continual Learning with Vision-Language Models via Semantic-Geometry Preservation

Chiyuan He, Zihuan Qiu, Fanman Meng +4 more

Continual learning of pretrained vision-language models (VLMs) is prone to catastrophic forgetting, yet current approaches adapt to new tasks without...

2 months ago cs.CV cs.LG PDF

Tool HIGH

Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems

Sarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner +2 more

Rapid progress in generative AI has given rise to Compound AI systems - pipelines comprised of multiple large language models (LLM), software tools...

2 months ago cs.CR cs.AI PDF

Benchmark MEDIUM

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

Junjie Chu, Yiting Qu, Ye Leng +4 more

Large Language Models (LLMs) are increasingly trained to align with human values, primarily focusing on task level, i.e., refusing to execute...

2 months ago cs.CR cs.AI PDF

Survey LOW

Silent Speech Interfaces in the Era of Large Language Models: A Comprehensive Taxonomy and Systematic Review

Kele Xu, Yifan Wang, Ming Feng +5 more

Human-computer interaction has traditionally relied on the acoustic channel, a dependency that introduces systemic vulnerabilities to environmental...

2 months ago eess.AS PDF

Attack HIGH

The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

J Alex Corll

Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first...

2 months ago cs.CR cs.AI PDF

Survey LOW

Human in the Loop for Fuzz Testing: Literature Review and the Road Ahead

Jiongchi Yu, Xiaolin Wen, Sizhe Cheng +3 more

Fuzz testing is one of the most effective techniques for detecting bugs and vulnerabilities in software. However, as the basis of fuzz testing,...

2 months ago cs.SE cs.HC PDF

Tool MEDIUM

OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents

Frank Li

Tool-augmented LLM agents introduce security risks that extend beyond user-input filtering, including indirect prompt injection through fetched...

2 months ago cs.CR PDF

Tool LOW

Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework

Chingkwun Lam, Jiaxin Li, Lingfei Zhang +1 more

Long-term memory has emerged as a foundational component of autonomous Large Language Model (LLM) agents, enabling continuous adaptation, lifelong...

2 months ago cs.AI PDF

Defense MEDIUM

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu +15 more

Autonomous Large Language Model (LLM) agents, exemplified by OpenClaw, demonstrate remarkable capabilities in executing complex, long-horizon tasks....

2 months ago cs.CR cs.AI PDF

Defense LOW

Noise-aware few-shot learning through bi-directional multi-view prompt alignment

Lu Niu, Cheng Xue

Vision-language models offer strong few-shot capability through prompt tuning but remain vulnerable to noisy labels, which can corrupt prompts and...

2 months ago cs.CV PDF

Benchmark MEDIUM

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

Qizhi Chen, Chao Qi, Yihong Huang +5 more

Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and...

2 months ago cs.LG cs.AI cs.CR PDF

Benchmark LOW

AutoVeriFix+: High-Correctness RTL Generation via Trace-Aware Causal Fix and Semantic Redundancy Pruning

Yan Tan, Xiangchen Meng, Zijun Jiang +1 more

Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as...

2 months ago cs.PL cs.AR PDF

Benchmark LOW

Follow the Saliency: Supervised Saliency for Retrieval-augmented Dense Video Captioning

Seung hee Choi, MinJu Jeon, Hyunwoo Oh +2 more

Existing retrieval-augmented approaches for Dense Video Captioning (DVC) often fail to achieve accurate temporal segmentation aligned with true event...

2 months ago cs.CV PDF

Defense MEDIUM

Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment

Zhiyu Xue, Zimo Qi, Guangliang Liu +2 more

Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with refusal...

2 months ago cs.AI PDF

Attack HIGH

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

Indranil Halder, Annesya Banerjee, Cengiz Pehlevan

Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that adversarial...

2 months ago cs.LG cs.AI PDF

Tool LOW

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan +2 more

Unlearning in Large Language Models (LLMs) aims to enhance safety, mitigate biases, and comply with legal mandates, such as the right to be...

2 months ago cs.AI PDF

Benchmark LOW

Security-by-Design for LLM-Based Code Generation: Leveraging Internal Representations for Concept-Driven Steering Mechanisms

Maximilian Wendlinger, Daniel Kowatsch, Konstantin Böttinger +1 more

Large Language Models (LLMs) show remarkable capabilities in understanding natural language and generating complex code. However, as practitioners...

2 months ago cs.CR cs.LG PDF

Tool HIGH

Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models

Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran

Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with...

2 months ago cs.LG cs.CR PDF

Benchmark MEDIUM

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi +3 more

With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software...

2 months ago cs.LG cs.CL cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial