CVE-2024-0520: MLflow: path traversal enables RCE via dataset loading

HIGH PoC AVAILABLE CISA: TRACK*
Published June 6, 2024
CISO Take

Any ML team running MLflow older than 2.9.0 and loading datasets from external HTTP URLs is exposed to arbitrary file write and remote code execution — no authentication required, just a crafted HTTP response. Patch to 2.9.0 immediately; if patching is blocked, restrict MLflow to internal-only dataset sources and block outbound HTTP dataset loading at the network level. Treat any MLflow host as a potential pivot point into training infrastructure, model artifacts, and credentials.

What is the risk?

HIGH. CVSS 8.8 reflects the real-world severity: network-accessible, low complexity, no privileges needed on MLflow itself. The only friction is user interaction — a data scientist must load a dataset from an attacker-controlled URL, which is trivially achievable via social engineering (Slack message, shared notebook, poisoned dataset registry). MLflow instances are often deployed inside corporate networks with broad access to training data, model registries, and cloud credentials, making post-exploitation impact severe.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
MLflow pip No patch
26.6K OpenSSF 5.6 655 dependents Pushed 4d ago 31% patched ~51d to patch Full package profile →

Do you use MLflow? You're affected.

How severe is it?

CVSS 3.1
8.8 / 10
EPSS
2.4%
chance of exploitation in 30 days
Higher than 82% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI Required
S Unchanged
C High
I High
A High

What should I do?

6 steps
  1. PATCH

    Upgrade MLflow to >= 2.9.0 immediately — this is the only complete fix.

  2. NETWORK CONTROLS

    If patching is delayed, restrict MLflow servers from making outbound HTTP requests to untrusted domains via egress firewall rules.

  3. RUNTIME CONTROLS

    Run MLflow under a least-privilege service account with minimal filesystem write permissions; use read-only mounts where possible.

  4. DETECTION

    Monitor for unexpected file creation in non-data directories by the MLflow process (auditd or Falco rules on the mlflow user); alert on Content-Disposition headers containing '../' or absolute paths in outbound HTTP responses via WAF/proxy.

  5. AUDIT

    Check MLflow logs for dataset loads from external URLs; review recently loaded datasets for suspicious source URLs.

  6. VERIFY

    Confirm your deployed version with pip show mlflow or container image inspection.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2.6 - AI system security
NIST AI RMF
GOVERN 6.2 - Policies and procedures are in place for AI risk management MANAGE 2.2 - Mechanisms for sustaining AI risk management are in place
OWASP LLM Top 10
LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2024-0520?

Any ML team running MLflow older than 2.9.0 and loading datasets from external HTTP URLs is exposed to arbitrary file write and remote code execution — no authentication required, just a crafted HTTP response. Patch to 2.9.0 immediately; if patching is blocked, restrict MLflow to internal-only dataset sources and block outbound HTTP dataset loading at the network level. Treat any MLflow host as a potential pivot point into training infrastructure, model artifacts, and credentials.

Is CVE-2024-0520 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2024-0520, increasing the risk of exploitation.

How to fix CVE-2024-0520?

1. PATCH: Upgrade MLflow to >= 2.9.0 immediately — this is the only complete fix. 2. NETWORK CONTROLS: If patching is delayed, restrict MLflow servers from making outbound HTTP requests to untrusted domains via egress firewall rules. 3. RUNTIME CONTROLS: Run MLflow under a least-privilege service account with minimal filesystem write permissions; use read-only mounts where possible. 4. DETECTION: Monitor for unexpected file creation in non-data directories by the MLflow process (auditd or Falco rules on the mlflow user); alert on Content-Disposition headers containing '../' or absolute paths in outbound HTTP responses via WAF/proxy. 5. AUDIT: Check MLflow logs for dataset loads from external URLs; review recently loaded datasets for suspicious source URLs. 6. VERIFY: Confirm your deployed version with `pip show mlflow` or container image inspection.

What systems are affected by CVE-2024-0520?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, MLOps platforms, experiment tracking infrastructure, shared data science environments, model serving.

What is the CVSS score for CVE-2024-0520?

CVE-2024-0520 has a CVSS v3.1 base score of 8.8 (HIGH). The EPSS exploitation probability is 2.38%.

What is the AI security impact?

Affected AI Architectures

training pipelinesMLOps platformsexperiment tracking infrastructureshared data science environmentsmodel serving

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0025 Exfiltration via Cyber Means
AML.T0035 AI Artifact Collection
AML.T0049 Exploit Public-Facing Application
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.6.2.6
NIST AI RMF: GOVERN 6.2, MANAGE 2.2
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

A vulnerability in mlflow/mlflow version 8.2.1 allows for remote code execution due to improper neutralization of special elements used in an OS command ('Command Injection') within the `mlflow.data.http_dataset_source.py` module. Specifically, when loading a dataset from a source URL with an HTTP scheme, the filename extracted from the `Content-Disposition` header or the URL path is used to generate the final file path without proper sanitization. This flaw enables an attacker to control the file path fully by utilizing path traversal or absolute path techniques, such as '../../tmp/poc.txt' or '/tmp/poc.txt', leading to arbitrary file write. Exploiting this vulnerability could allow a malicious user to execute commands on the vulnerable machine, potentially gaining access to data and model information. The issue is fixed in version 2.9.0.

Exploitation Scenario

An adversary targets a data science team by sharing a convincing-looking dataset via a public URL (e.g., in a research forum or Slack message). The URL points to an attacker-controlled HTTP server. When a data scientist loads the dataset using MLflow, the server returns a Content-Disposition header like: `Content-Disposition: attachment; filename=../../.local/lib/python3.10/site-packages/mlflow/__init__.py`. MLflow writes the attacker's payload (a Python backdoor) to that path without validation, overwriting the MLflow package itself. On the next MLflow import or training job execution, the backdoor runs with full process privileges — establishing a reverse shell, exfiltrating AWS credentials from the instance metadata service, or poisoning model artifacts stored in S3.

Weaknesses (CWE)

CWE-22 — Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal'): The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.

  • [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
  • [Architecture and Design] For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

Timeline

Published
June 6, 2024
Last Modified
October 15, 2025
First Seen
June 6, 2024

Related Vulnerabilities