MLflow: weak dataset hash allows integrity bypass — LOW (CVE-2026-10803)

CISO Take

MLflow up to 3.10.0 uses a weak cryptographic hash (CWE-327/CWE-328) in its Dataset Digest Computation component (`mlflow/data/digest_utils.py`), the mechanism MLflow relies on to track dataset versions and lineage across ML experiments. While the CVSS score is low (3.6) and exploitation requires local access with high complexity, the security concern for ML teams is that an attacker with a local foothold — or a malicious insider — could craft a poisoned dataset that produces an identical weak digest as the legitimate one, substituting it in the artifact store without triggering any integrity alert in MLflow's tracking system. MLflow is embedded in a large number of enterprise training pipelines, and silent dataset substitution is a reliable first step toward undetected training data poisoning with downstream model integrity consequences. No patch has been released as of CVE publication despite responsible disclosure via PR #22420; monitor the MLflow GitHub repository for a fix and implement compensating controls on dataset integrity verification in the interim.

Sources: NVD ATLAS GitHub Advisory

What is the risk?

Risk is LOW in isolation but contextually elevated for organizations that treat MLflow dataset digests as a sole or primary data integrity control. The local attack vector and high exploitation complexity substantially reduce immediate exploitability — opportunistic attackers are not a concern here. The relevant threat actor is a privileged insider or an attacker who has already compromised a developer or MLOps workstation. The absence of a patch at publication time, combined with a published exploit reference and no vendor response to the initial PR, suggests the window of exposure may persist. Organizations using MLflow dataset digests as audit evidence for ISO 42001 or EU AI Act compliance face an additional gap: those records are cryptographically untrustworthy until the algorithm is hardened.

Attack Kill Chain

Local Reconnaissance

Attacker with low-privilege local access to the MLflow environment inspects digest_utils.py to identify the specific weak hash algorithm used in dataset digest computation.

AML.T0037

Collision Crafting

Adversary constructs a poisoned training dataset engineered to produce an identical weak digest as the legitimate target dataset, exploiting the cryptographic weakness in the hash function.

AML.T0059

Dataset Substitution

Legitimate training dataset in the MLflow artifact store is silently replaced with the crafted poisoned version; MLflow digest tracking records no discrepancy due to the hash collision.

AML.T0020

Silent Model Compromise

Poisoned data enters the training pipeline undetected through normal MLflow experiment runs, potentially introducing model backdoors or degraded behavior while audit trails remain clean.

AML.T0018.000

Local Reconnaissance

Attacker with low-privilege local access to the MLflow environment inspects digest_utils.py to identify the specific weak hash algorithm used in dataset digest computation.

AML.T0037

Collision Crafting

Adversary constructs a poisoned training dataset engineered to produce an identical weak digest as the legitimate target dataset, exploiting the cryptographic weakness in the hash function.

AML.T0059

Dataset Substitution

Legitimate training dataset in the MLflow artifact store is silently replaced with the crafted poisoned version; MLflow digest tracking records no discrepancy due to the hash collision.

AML.T0020

Silent Model Compromise

Poisoned data enters the training pipeline undetected through normal MLflow experiment runs, potentially introducing model backdoors or degraded behavior while audit trails remain clean.

AML.T0018.000

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
mlflow	pip	—	No patch
26.2K OpenSSF 5.6 646 dependents Pushed 5d ago 27% patched ~53d to patch Full package profile →

Do you use mlflow? You're affected.

Severity & Risk

CVSS 3.1

3.6 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Advanced

Attack Surface

AV Local

AC High

PR Low

UI None

S Unchanged

C None

I Low

A Low

What should I do?

5 steps

Upgrade MLflow to a version beyond 3.10.0 once a patched release is published — monitor the mlflow/mlflow GitHub repository and security advisories actively given no vendor response yet.
Do not rely solely on MLflow dataset digests for data integrity verification; supplement with external strong cryptographic hashing (SHA-256 or SHA-3) of training datasets stored independently of MLflow.
Audit existing MLflow experiment logs and compare dataset digests against externally computed strong hashes to detect any historical substitution.
Restrict local access to MLflow artifact stores and dataset directories using filesystem ACLs and principle of least privilege to reduce insider threat exposure.
If MLflow dataset digests are used as compliance evidence for ISO 42001 or EU AI Act data governance requirements, formally document this known cryptographic weakness and implement compensating controls before the next audit cycle.

Classification

Supply Chain Model Poisoning Framework Training Data AML.T0010.002 - Data AML.T0020 - Poison Training Data AML.T0059 - Erode Dataset Integrity

Compliance Impact

This CVE is relevant to:

EU AI Act

Article 10 - Data and data governance

ISO 42001

A.6.1.4 - Data management

NIST AI RMF

MEASURE 2.5 - Data and AI system quality evaluation

OWASP LLM Top 10

LLM03 - Training Data Poisoning

Frequently Asked Questions

What is CVE-2026-10803?

MLflow up to 3.10.0 uses a weak cryptographic hash (CWE-327/CWE-328) in its Dataset Digest Computation component (`mlflow/data/digest_utils.py`), the mechanism MLflow relies on to track dataset versions and lineage across ML experiments. While the CVSS score is low (3.6) and exploitation requires local access with high complexity, the security concern for ML teams is that an attacker with a local foothold — or a malicious insider — could craft a poisoned dataset that produces an identical weak digest as the legitimate one, substituting it in the artifact store without triggering any integrity alert in MLflow's tracking system. MLflow is embedded in a large number of enterprise training pipelines, and silent dataset substitution is a reliable first step toward undetected training data poisoning with downstream model integrity consequences. No patch has been released as of CVE publication despite responsible disclosure via PR #22420; monitor the MLflow GitHub repository for a fix and implement compensating controls on dataset integrity verification in the interim.

Is CVE-2026-10803 actively exploited?

No confirmed active exploitation of CVE-2026-10803 has been reported, but organizations should still patch proactively.

How to fix CVE-2026-10803?

1. Upgrade MLflow to a version beyond 3.10.0 once a patched release is published — monitor the mlflow/mlflow GitHub repository and security advisories actively given no vendor response yet. 2. Do not rely solely on MLflow dataset digests for data integrity verification; supplement with external strong cryptographic hashing (SHA-256 or SHA-3) of training datasets stored independently of MLflow. 3. Audit existing MLflow experiment logs and compare dataset digests against externally computed strong hashes to detect any historical substitution. 4. Restrict local access to MLflow artifact stores and dataset directories using filesystem ACLs and principle of least privilege to reduce insider threat exposure. 5. If MLflow dataset digests are used as compliance evidence for ISO 42001 or EU AI Act data governance requirements, formally document this known cryptographic weakness and implement compensating controls before the next audit cycle.

What systems are affected by CVE-2026-10803?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, MLOps experiment tracking, data versioning and lineage systems, CI/CD model pipelines.

What is the CVSS score for CVE-2026-10803?

CVE-2026-10803 has a CVSS v3.1 base score of 3.6 (LOW).

AI Security Impact

Affected AI Architectures

training pipelinesMLOps experiment trackingdata versioning and lineage systemsCI/CD model pipelines

MITRE ATLAS Techniques

AML.T0010.002 Data

AML.T0020 Poison Training Data

AML.T0059 Erode Dataset Integrity

Compliance Controls Affected

EU AI Act: Article 10

ISO 42001: A.6.1.4

NIST AI RMF: MEASURE 2.5

OWASP LLM Top 10: LLM03

Technical Details

Original Advisory

A flaw has been found in MLflow up to 3.10.0. This issue affects the function mlflow.data.digest_utils of the file mlflow/data/digest_utils.py of the component Dataset Digest Computation. This manipulation causes use of weak hash. It is possible to launch the attack on the local host. The attack is considered to have high complexity. The exploitability is assessed as difficult. The exploit has been published and may be used. The project was informed of the problem early through a pull request but has not reacted yet.

Exploitation Scenario

A malicious insider with low-privilege local access to an MLflow-integrated training environment inspects the digest_utils.py implementation and identifies the weak hash algorithm in use. They pre-compute a hash collision by crafting a poisoned training dataset — for example, one with subtly mislabeled security-relevant samples or an embedded backdoor trigger — that produces the same digest as the legitimate production dataset. They replace the legitimate dataset in the MLflow artifact store. When a scheduled training run executes, MLflow records the run with a dataset digest that matches historical baselines, so lineage checks and any automated integrity gates pass silently. The resulting model incorporates the poisoned data, potentially introducing a backdoor or degrading performance on specific inputs, while the MLflow audit trail shows no anomaly and post-hoc forensics find a valid-looking digest chain.