Self-Mined Hardness for Safety Fine-Tuning
Prakhar Gupta, Garv Shah, Donghua Zhang
Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's...
2,529+ academic papers on AI security, attacks, and defenses
Showing 41–60 of 151 papers
Clear filtersPrakhar Gupta, Garv Shah, Donghua Zhang
Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's...
Javad Forough, Marios Kogias, Hamed Haddadi
Agentic AI systems, specifically LLM-driven agents that plan, invoke tools, maintain persistent memory, and delegate tasks to peer agents via...
Divyam Anshumaan, Sarthak Choudhary, Nils Palumbo +1 more
LLM agents release private data across multi-service interactions. Existing prompt sanitizers based on metric differential privacy treat each release...
Mingshuo Liu, Yiwei Zha, Min Chen
Browsing-enabled LLM assistants can fetch webpages and answer contact-seeking queries, creating a practical channel for scraping contact-style...
Kerri Prinos, Lilianne Brush, Cameron Denton +5 more
Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches....
Mingming Zha, Xiaofeng Wang
Autonomous LLM agents operate as long-running processes with persistent workspaces, memory files, scheduled task state, and messaging integrations....
Purna Sai Garigipati, Onur Ayan, Kishor Chandra Joshi +1 more
Artificial Intelligence (AI) will play an essential role in 6G. It will fundamentally reshape the network architecture itself and drive major changes...
Wenjing Duan, Qi Zhou, Yuanfan Li
Machine-generated text (MGT) detection is critical for regulating online information ecosystems, yet existing detectors often underperform in...
Judith Sáinz-Pardo Díaz, Álvaro López García
The growing development of artificial intelligence based solutions, together with privacy legislation, has driven the rise of the so-called privacy...
Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi +4 more
Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While...
Wenwei Zhao, Xiaowen Li, Yao Liu +1 more
Federated learning (FL) is vulnerable to poisoning attacks, where malicious clients upload manipulated updates to degrade the performance of the...
Debeshee Das, Julien Piet, Darya Kaviani +3 more
Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We...
Sadia Asif, Mohammad Mohammadi Amiri
Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable...
Jiajia Li, Xiaoyu Wen, Zhongtian Ma +3 more
The growing capabilities of large language models (LLMs) have driven their widespread deployment across diverse domains, even in potentially...
George Fatouros, Georgios Makridis, John Soldatos +18 more
European financial institutions face mounting regulatory pressure while their security operations centres remain constrained not by data or staffing...
Zhiyang Dai, Yansong Gao, Boyu Kuang +5 more
Contrastive learning (CL) reduces annotation cost via auto-derived supervisory signals. Since large-scale in-house CL datasets are infeasible,...
Huining Cui, Wei Liu
Retrieval-augmented generation (RAG) improves factual grounding by conditioning large language models on retrieved evidence, but it also opens a...
Jona te Lintelo, Lichao Wu, Marina Krček +2 more
Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs) have significantly reduced inference costs through sparse activation. However,...
Xiaokun Luan, Yihao Zhang, Pengcheng Su +2 more
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a...
Han Liu, Shanghao Shi, Yevgeniy Vorobeychik +2 more
Low-Rank Adaptation (LoRA), which leverages the insight that model updates typically reside in a low-dimensional space, has significantly improved...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial