Auto-Tuning Safety Guardrails for Black-Box Large Language Models
Perry Abdulkadir
Large language models (LLMs) are increasingly deployed behind safety guardrails such as system prompts and content filters, especially in settings...
2,583+ academic papers on AI security, attacks, and defenses
Showing 781–800 of 1,228 papers
Clear filtersPerry Abdulkadir
Large language models (LLMs) are increasingly deployed behind safety guardrails such as system prompts and content filters, especially in settings...
Samruddhi Baviskar
We evaluate adversarial robustness in tabular machine learning models used in financial decision making. Using credit scoring and fraud detection...
Mohammad Mahdi Razmjoo, Mohammad Mahdi Sharifian, Saeed Bagheri Shouraki
Despite their remarkable performance, deep neural networks exhibit a critical vulnerability: small, often imperceptible, adversarial perturbations...
Li Lin, Siyuan Xin, Yang Cao +1 more
Watermarking large language models (LLMs) is vital for preventing their misuse, including the fabrication of fake news, plagiarism, and spam. It is...
Sanjay Das, Swastik Bhattacharya, Shamik Kundu +3 more
State-space models (SSMs), exemplified by the Mamba architecture, have recently emerged as state-of-the-art sequence-modeling frameworks, offering...
Luoxi Meng, Henry Feng, Ilia Shumailov +1 more
Browser-using agents (BUAs) are an emerging class of AI agents that interact with web browsers in human-like ways, including clicking, scrolling,...
Arastoo Zibaeirad, Marco Vieira
Large Language Models (LLMs) are increasingly being studied for Software Vulnerability Detection (SVD) and Repair (SVR). Individual LLMs have...
J. Alexander Curtis, Nasir U. Eisty
Penetration testing is a cornerstone of cybersecurity, traditionally driven by manual, time-intensive processes. As systems grow in complexity, there...
Dang-Khoa Nguyen, Gia-Thang Ho, Quang-Minh Pham +5 more
Software supply chain attacks targeting the npm ecosystem have become increasingly sophisticated, leveraging obfuscation and complex logic to evade...
Hua Ma, Ruoxi Sun, Minhui Xue +4 more
Accurate time-series forecasting is increasingly critical for planning and operations in low-carbon power systems. Emerging time-series large...
Padmeswari Nandiya, Ahmad Mohsin, Ahmed Ibrahim +2 more
Industry 5.0's increasing integration of IT and OT systems is transforming industrial operations but also expanding the cyber-physical attack...
Xin Yang, Omid Ardakanian
Data obfuscation is a promising technique for mitigating attribute inference attacks by semi-trusted parties with access to time-series data emitted...
Edward Lue Chee Lip, Anthony Channg, Diana Kim +2 more
As AI capabilities advance, we increasingly rely on powerful models to decompose complex tasks $\unicode{x2013}$ but what if the decomposer itself is...
Andrew Adiletta, Kathryn Adiletta, Kemal Derya +1 more
The rapid deployment of Large Language Models (LLMs) has created an urgent need for enhanced security and privacy measures in Machine Learning (ML)....
Jamal Al-Karaki, Muhammad Al-Zafar Khan, Rand Derar Mohammad Al Athamneh
The scarcity of cyberattack data hinders the development of robust intrusion detection systems. This paper introduces PHANTOM, a novel adversarial...
Alexander K. Saeri, Sophia Lloyd George, Jess Graham +4 more
Organizations and governments that develop, deploy, use, and govern AI must coordinate on effective risk mitigation. However, the landscape of AI...
Manon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran +2 more
Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities....
Neha, Tarunpreet Bhatia
Intrusion Detection Systems (IDS) are critical components in safeguarding 5G/6G networks from both internal and external cyber threats. While...
Han Yang, Shaofeng Li, Tian Dong +3 more
Deep Neural Networks (DNNs), as valuable intellectual property, face unauthorized use. Existing protections, such as digital watermarking, are...
N Mangala, Murtaza Rangwala, S Aishwarya +5 more
Healthcare has become exceptionally sophisticated, as wearables and connected medical devices are revolutionising remote patient monitoring,...
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act), and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial