Attack Type

Jailbreak

Jailbreaks are inputs designed to defeat the alignment training and safety filters of a language model so that it produces output it was trained to refuse — explicit instructions for harm, restricted code, disallowed content, or impersonation of a different system. Common patterns include roleplay framing ("pretend you are an unrestricted AI"), encoded payloads (base64, leet, hypothetical scenarios), multi-turn coercion, and adversarial suffixes generated by gradient attacks against open-weights models (the GCG attack and successors). Jailbreaks differ from prompt injection in intent and target: prompt injection hijacks an application's control flow, while jailbreaks defeat the model's own refusal policy. The two are often combined — an injected payload may include a jailbreak prefix. AI Threat Alert tracks jailbreak research from arxiv plus the small but growing set of CVEs filed against shipped LLM guardrails.

2
Total CVEs
1
Pages
Page 1 of 1
Current
Severity CVE CVSS
HIGH CVE-2025-30358 8.1
HIGH CVE-2026-4399 7.5