AI Security Research

2,529+ academic papers on AI security, attacks, and defenses

Total

2,529

Attack

969

Benchmark

729

Defense

345

Tool

272

Survey

142

Type

All Attack Defense Survey Benchmark Tool

Relevance

All High Medium

Date

All time 7 days 30 days 6 months

Showing 161–180 of 222 papers

Clear filters

Defense MEDIUM

Are LLMs Good Safety Agents or a Propaganda Engine?

Neemesh Yadav, Francesco Ortu, Jiarui Liu +5 more

Large Language Models (LLMs) are trained to refuse to respond to harmful content. However, systematic analyses of whether this behavior is truly a...

5 months ago cs.CL PDF

Defense MEDIUM

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation

Junbo Zhang, Ran Chen, Qianli Zhou +2 more

Large language models demonstrate powerful capabilities across various natural language processing tasks, yet they also harbor safety...

5 months ago cs.CR cs.CL PDF

Defense MEDIUM

EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering

Onat Gungor, Roshan Sood, Jiasheng Zhou +1 more

Large Language Models (LLMs) are highly effective for cybersecurity question answering (QA) but are difficult to deploy on edge devices due to their...

5 months ago cs.CR PDF

Defense MEDIUM

Beyond Binary Classification: A Semi-supervised Approach to Generalized AI-generated Image Detection

Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen +1 more

The rapid advancement of generators (e.g., StyleGAN, Midjourney, DALL-E) has produced highly realistic synthetic images, posing significant...

5 months ago cs.LG cs.AI cs.CR PDF

Defense MEDIUM

SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Swastik Bhattacharya, Sanjay Das, Anand Menon +3 more

Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these...

5 months ago cs.AR cs.LG PDF

Defense MEDIUM

Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models

Samih Fadli

Large language model safety is usually assessed with static benchmarks, but key failures are dynamic: value drift under distribution shift, jailbreak...

5 months ago cs.CL cs.AI cs.LG PDF

Defense MEDIUM

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

Zhaoxin Zhang, Borui Chen, Yiming Hu +3 more

Recent research on large language model (LLM) jailbreaks has primarily focused on techniques that bypass safety mechanisms to elicit overtly harmful...

5 months ago cs.CL PDF

Defense MEDIUM

N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

Zheyu Lin, Jirui Yang, Yukui Qiu +3 more

Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and...

5 months ago cs.LG cs.CR PDF

Defense MEDIUM

Certified but Fooled! Breaking Certified Defences with Ghost Certificates

Quoc Viet Vo, Tashreque M. Haq, Paul Montague +3 more

Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better...

5 months ago cs.LG cs.CR cs.CV PDF

Defense MEDIUM

SGuard-v1: Safety Guardrail for Large Language Models

JoonHo Lee, HyeonMin Cho, Jaewoong Yun +3 more

We present SGuard-v1, a lightweight safety guardrail for Large Language Models (LLMs), which comprises two specialized models to detect harmful...

5 months ago cs.CL cs.AI cs.CR PDF

Defense MEDIUM

Rethinking Deep Alignment Through The Lens Of Incomplete Learning

Thong Bach, Dung Nguyen, Thao Minh Le +1 more

Large language models exhibit systematic vulnerabilities to adversarial attacks despite extensive safety alignment. We provide a mechanistic analysis...

5 months ago cs.LG PDF

Defense MEDIUM

EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment

Ruoxi Cheng, Haoxuan Ma, Teng Ma +1 more

Large Vision-Language Models (LVLMs) exhibit powerful reasoning capabilities but suffer sophisticated jailbreak vulnerabilities. Fundamentally,...

5 months ago cs.AI PDF

Defense MEDIUM

EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models

Jialin Wu, Kecen Li, Zhicong Huang +3 more

Many machine learning models are fine-tuned from large language models (LLMs) to achieve high performance in specialized domains like code...

6 months ago cs.CL cs.CR PDF

Defense MEDIUM

Slice-Aware Spoofing Detection in 5G Networks Using Lightweight Machine Learning

Daniyal Ganiuly, Nurzhau Bolatbek

The increasing virtualization of fifth generation (5G) networks expands the attack surface of the user plane, making spoofing a persistent threat to...

6 months ago cs.CR cs.NI PDF

Defense MEDIUM

HybridGuard: Enhancing Minority-Class Intrusion Detection in Dew-Enabled Edge-of-Things Networks

Binayak Kara, Ujjwal Sahua, Ciza Thomas +1 more

Securing Dew-Enabled Edge-of-Things (EoT) networks against sophisticated intrusions is a critical challenge. This paper presents HybridGuard, a...

6 months ago cs.CR cs.AI cs.LG PDF

Defense MEDIUM

A Self-Improving Architecture for Dynamic Safety in Large Language Models

Tyler Slater

Context: The integration of Large Language Models (LLMs) into core software systems is accelerating. However, existing software architecture patterns...

6 months ago cs.SE cs.AI cs.CR PDF

Defense MEDIUM

EASE: Practical and Efficient Safety Alignment for Small Language Models

Haonan Shi, Guoli Wang, Tu Ouyang +1 more

Small language models (SLMs) are increasingly deployed on edge devices, making their safety alignment crucial yet challenging. Current shallow...

6 months ago cs.CR cs.LG PDF

Defense MEDIUM

Explaining Software Vulnerabilities with Large Language Models

Oshando Johnson, Alexandra Fomina, Ranjith Krishnamurthy +3 more

The prevalence of security vulnerabilities has prompted companies to adopt static application security testing (SAST) tools for vulnerability...

6 months ago cs.SE cs.AI PDF

Defense MEDIUM

STARS: Synchronous Token Alignment for Robust Supervision in Large Language Models

Mohammad Atif Quamar, Mohammad Areeb, Mikhail Kuznetsov +2 more

Aligning large language models (LLMs) with human values is crucial for safe deployment. Inference-time techniques offer granular control over...

6 months ago cs.CL PDF

Defense MEDIUM

Reimagining Safety Alignment with An Image

Yifan Xia, Guorui Chen, Wenqian Yu +3 more

Large language models (LLMs) excel in diverse applications but face dual challenges: generating harmful content under jailbreak attacks and...

6 months ago cs.AI cs.CR PDF

Track AI security vulnerabilities in real time

Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.

Start 14-Day Free Trial