LLM Reinforcement in Context
Thomas Rivasseau
Current Large Language Model alignment research mostly focuses on improving model robustness against adversarial attacks and misbehavior by training...
2,077+ academic papers on AI security, attacks, and defenses
Showing 681–700 of 986 papers
Clear filtersThomas Rivasseau
Current Large Language Model alignment research mostly focuses on improving model robustness against adversarial attacks and misbehavior by training...
Rathin Chandra Shit, Sharmila Subudhi
The security of autonomous vehicle networks is facing major challenges, owing to the complexity of sensor integration, real-time performance demands,...
JoonHo Lee, HyeonMin Cho, Jaewoong Yun +3 more
We present SGuard-v1, a lightweight safety guardrail for Large Language Models (LLMs), which comprises two specialized models to detect harmful...
Onkar Shelar, Travis Desell
Large Language Models remain vulnerable to adversarial prompts that elicit toxic content even after safety alignment. We present ToxSearch, a...
Yuting Tan, Yi Huang, Zhuo Li
Backdoor attacks on large language models (LLMs) typically couple a secret trigger to an explicit malicious output. We show that this explicit...
Thong Bach, Dung Nguyen, Thao Minh Le +1 more
Large language models exhibit systematic vulnerabilities to adversarial attacks despite extensive safety alignment. We provide a mechanistic analysis...
Sajad U P
Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant...
Shaowei Guan, Yu Zhai, Zhengyu Zhang +2 more
Large Language Models (LLMs) are increasingly vulnerable to adversarial attacks that can subtly manipulate their outputs. While various defense...
Shanmin Wang, Dongdong Zhao
Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party...
Lucas Fenaux, Christopher Srinivasa, Florian Kerschbaum
Transparency and security are both central to Responsible AI, but they may conflict in adversarial settings. We investigate the strategic effect of...
Yanbo Dai, Zongjie Li, Zhenlan Ji +1 more
Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, demonstrating human-level...
Ruoxi Cheng, Haoxuan Ma, Teng Ma +1 more
Large Vision-Language Models (LVLMs) exhibit powerful reasoning capabilities but suffer sophisticated jailbreak vulnerabilities. Fundamentally,...
Farhad Abtahi, Fernando Seoane, Iván Pau +1 more
Healthcare AI systems face major vulnerabilities to data poisoning that current defenses and regulations cannot adequately address. We analyzed eight...
Zichao Wei, Jun Zeng, Ming Wen +8 more
Software vulnerabilities are increasing at an alarming rate. However, manual patching is both time-consuming and resource-intensive, while existing...
Feilong Wang, Fuqiang Liu
The integration of large language models (LLMs) into automated driving systems has opened new possibilities for reasoning and decision-making by...
Guangke Chen, Yuhui Wang, Shouling Ji +2 more
Modern text-to-speech (TTS) systems, particularly those built on Large Audio-Language Models (LALMs), generate high-fidelity speech that faithfully...
Dennis Wei, Ronny Luss, Xiaomeng Hu +6 more
Large Language Models (LLMs) have become ubiquitous in everyday life and are entering higher-stakes applications ranging from summarizing meeting...
Fred Heiding, Simon Lermen
We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to...
Jialin Wu, Kecen Li, Zhicong Huang +3 more
Many machine learning models are fine-tuned from large language models (LLMs) to achieve high performance in specialized domains like code...
Catherine Xia, Manar H. Alalfi
AI programming assistants have demonstrated a tendency to generate code containing basic security vulnerabilities. While developers are ultimately...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial