Attack HIGH
Yannick Assogba, Jacopo Cortellazzi, Javier Abad +3 more
Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based...
3 months ago cs.CR cs.CL cs.LG
PDF
Other HIGH
Nate Rahn, Allison Qi, Avery Griffin +3 more
We want language model assistants to conform to a character specification, which asserts how the model should act across diverse user interactions....
Tool HIGH
Yuepeng Hu, Yuqi Jia, Mengyuan Li +2 more
In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user installs the tool and the LLM agent selects...
Defense MEDIUM
Zhaoxin Wang, Jiaming Liang, Fengbin Zhu +5 more
Large language models (LLMs) and multimodal LLMs are typically safety-aligned before release to prevent harmful content generation. However, recent...
Defense MEDIUM
Yujun Zhou, Yue Huang, Han Bao +8 more
While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging:...
3 months ago cs.LG cs.CL
PDF
Survey MEDIUM
Varpu Vehomäki, Kimmo K. Kaski
Understanding cyber security is increasingly important for individuals and organizations. However, a lot of information related to cyber security can...
Defense MEDIUM
Christian Rondanini, Barbara Carminati, Elena Ferrari +2 more
The proliferation of edge devices has created an urgent need for security solutions capable of detecting malware in real time while operating under...
3 months ago cs.CR cs.AI cs.DC
PDF
Attack HIGH
Dong Yan, Jian Liang, Ran He +1 more
Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text...
3 months ago cs.CR cs.AI cs.CL
PDF
Benchmark MEDIUM
Faouzi El Yagoubi, Ranwa Al Mallah, Godwin Badu-Marfo
Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks,...
Attack HIGH
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
Jailbreaking large language models (LLMs) has emerged as a critical security challenge with the widespread deployment of conversational AI systems....
3 months ago cs.CR cs.CL
PDF
Defense MEDIUM
Md Sazedur Rahman, Mizanur Rahman Jewel, Sanjay Madria
Mining is rapidly evolving into an AI driven cyber physical ecosystem where safety and operational reliability depend on robust perception,...
3 months ago cs.CR cs.DC
PDF
Benchmark MEDIUM
Aashish Kolluri, Rishi Sharma, Manuel Costa +5 more
Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such...
3 months ago cs.CR cs.LG
PDF
Benchmark LOW
Yang Liu, Armstrong Foundjem, Xingfang Wu +2 more
Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as code...
Benchmark MEDIUM
Arpit Singh Gautam, Kailash Talreja, Saurabh Jha
Large Language Models (LLMs) frequently hallucinate plausible but incorrect assertions, a vulnerability often missed by uncertainty metrics when...
3 months ago cs.CL cs.AI
PDF
Attack MEDIUM
Abhishek Saini, Haolin Jiang, Hang Liu
The deployment of large language models (LLMs) on third-party devices requires new ways to protect model intellectual property. While Trusted...
3 months ago cs.CR cs.AR
PDF
Attack HIGH
J Alex Corll
Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is...
3 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Zhenhua Zou, Sheng Guo, Qiuyang Zhan +6 more
The evolution of Large Language Models (LLMs) has shifted mobile computing from App-centric interactions to system-level autonomous agents. Current...
3 months ago cs.CR cs.AI
PDF
Benchmark MEDIUM
Xinguo Feng, Zhongkui Ma, Zihan Wang +2 more
Training and fine-tuning large-scale language models largely benefit from collaborative learning, but the approach has been proven vulnerable to...
Defense MEDIUM
Adel ElZemity, Joshua Sylvester, Budi Arief +1 more
SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes...
Benchmark MEDIUM
Matteo Migliarini, Berat Ercevik, Oluwagbemike Olowe +5 more
Large Language Models (LLMs) are increasingly deployed as active participants on public social media platforms, yet their behavior in these...
3 months ago cs.SI cs.CY
PDF
Track AI security vulnerabilities in real time
Get breaking CVE alerts, compliance reports (ISO 42001, EU AI Act),
and CISO risk assessments for your AI/ML stack.
Start 14-Day Free Trial