Attack Type

Jailbreak

Jailbreaks are inputs designed to defeat the alignment training and safety filters of a language model so that it produces output it was trained to refuse — explicit instructions for harm, restricted code, disallowed content, or impersonation of a different system. Common patterns include roleplay framing ("pretend you are an unrestricted AI"), encoded payloads (base64, leet, hypothetical scenarios), multi-turn coercion, and adversarial suffixes generated by gradient attacks against open-weights models (the GCG attack and successors). Jailbreaks differ from prompt injection in intent and target: prompt injection hijacks an application's control flow, while jailbreaks defeat the model's own refusal policy. The two are often combined — an injected payload may include a jailbreak prefix. AI Threat Alert tracks jailbreak research from arxiv plus the small but growing set of CVEs filed against shipped LLM guardrails.

Total CVEs

Pages

Page 1 of 1

Current

Severity	CVE	Headline	Package	CVSS
HIGH	CVE-2025-30358	Mesop: class pollution enables DoS and LLM jailbreak		8.1
HIGH	CVE-2026-4399	1millionbot Millie: Boolean prompt injection bypasses restrictions		7.5

Related Attack Types

Prompt Injection Data Extraction Supply Chain Model Poisoning Adversarial Examples Data Leakage

Jailbreak

Related Attack Types

Weekly CISO Take + top threats