CVE-2026-10803: MLflow: weak dataset hash allows integrity bypass
LOWMLflow up to 3.10.0 uses a weak cryptographic hash (CWE-327/CWE-328) in its Dataset Digest Computation component (`mlflow/data/digest_utils.py`), the mechanism MLflow relies on to track dataset versions and lineage across ML experiments. While the CVSS score is low (3.6) and exploitation requires local access with high complexity, the security concern for ML teams is that an attacker with a local foothold — or a malicious insider — could craft a poisoned dataset that produces an identical weak digest as the legitimate one, substituting it in the artifact store without triggering any integrity alert in MLflow's tracking system. MLflow is embedded in a large number of enterprise training pipelines, and silent dataset substitution is a reliable first step toward undetected training data poisoning with downstream model integrity consequences. No patch has been released as of CVE publication despite responsible disclosure via PR #22420; monitor the MLflow GitHub repository for a fix and implement compensating controls on dataset integrity verification in the interim.
What is the risk?
Risk is LOW in isolation but contextually elevated for organizations that treat MLflow dataset digests as a sole or primary data integrity control. The local attack vector and high exploitation complexity substantially reduce immediate exploitability — opportunistic attackers are not a concern here. The relevant threat actor is a privileged insider or an attacker who has already compromised a developer or MLOps workstation. The absence of a patch at publication time, combined with a published exploit reference and no vendor response to the initial PR, suggests the window of exposure may persist. Organizations using MLflow dataset digests as audit evidence for ISO 42001 or EU AI Act compliance face an additional gap: those records are cryptographically untrustworthy until the algorithm is hardened.
Attack Kill Chain
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| mlflow | pip | — | No patch |
Do you use mlflow? You're affected.
Severity & Risk
Attack Surface
What should I do?
5 steps-
Upgrade MLflow to a version beyond 3.10.0 once a patched release is published — monitor the mlflow/mlflow GitHub repository and security advisories actively given no vendor response yet.
-
Do not rely solely on MLflow dataset digests for data integrity verification; supplement with external strong cryptographic hashing (SHA-256 or SHA-3) of training datasets stored independently of MLflow.
-
Audit existing MLflow experiment logs and compare dataset digests against externally computed strong hashes to detect any historical substitution.
-
Restrict local access to MLflow artifact stores and dataset directories using filesystem ACLs and principle of least privilege to reduce insider threat exposure.
-
If MLflow dataset digests are used as compliance evidence for ISO 42001 or EU AI Act data governance requirements, formally document this known cryptographic weakness and implement compensating controls before the next audit cycle.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2026-10803?
MLflow up to 3.10.0 uses a weak cryptographic hash (CWE-327/CWE-328) in its Dataset Digest Computation component (`mlflow/data/digest_utils.py`), the mechanism MLflow relies on to track dataset versions and lineage across ML experiments. While the CVSS score is low (3.6) and exploitation requires local access with high complexity, the security concern for ML teams is that an attacker with a local foothold — or a malicious insider — could craft a poisoned dataset that produces an identical weak digest as the legitimate one, substituting it in the artifact store without triggering any integrity alert in MLflow's tracking system. MLflow is embedded in a large number of enterprise training pipelines, and silent dataset substitution is a reliable first step toward undetected training data poisoning with downstream model integrity consequences. No patch has been released as of CVE publication despite responsible disclosure via PR #22420; monitor the MLflow GitHub repository for a fix and implement compensating controls on dataset integrity verification in the interim.
Is CVE-2026-10803 actively exploited?
No confirmed active exploitation of CVE-2026-10803 has been reported, but organizations should still patch proactively.
How to fix CVE-2026-10803?
1. Upgrade MLflow to a version beyond 3.10.0 once a patched release is published — monitor the mlflow/mlflow GitHub repository and security advisories actively given no vendor response yet. 2. Do not rely solely on MLflow dataset digests for data integrity verification; supplement with external strong cryptographic hashing (SHA-256 or SHA-3) of training datasets stored independently of MLflow. 3. Audit existing MLflow experiment logs and compare dataset digests against externally computed strong hashes to detect any historical substitution. 4. Restrict local access to MLflow artifact stores and dataset directories using filesystem ACLs and principle of least privilege to reduce insider threat exposure. 5. If MLflow dataset digests are used as compliance evidence for ISO 42001 or EU AI Act data governance requirements, formally document this known cryptographic weakness and implement compensating controls before the next audit cycle.
What systems are affected by CVE-2026-10803?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, MLOps experiment tracking, data versioning and lineage systems, CI/CD model pipelines.
What is the CVSS score for CVE-2026-10803?
CVE-2026-10803 has a CVSS v3.1 base score of 3.6 (LOW).
AI Security Impact
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.002 Data AML.T0020 Poison Training Data AML.T0059 Erode Dataset Integrity Compliance Controls Affected
Technical Details
Original Advisory
A flaw has been found in MLflow up to 3.10.0. This issue affects the function mlflow.data.digest_utils of the file mlflow/data/digest_utils.py of the component Dataset Digest Computation. This manipulation causes use of weak hash. It is possible to launch the attack on the local host. The attack is considered to have high complexity. The exploitability is assessed as difficult. The exploit has been published and may be used. The project was informed of the problem early through a pull request but has not reacted yet.
Exploitation Scenario
A malicious insider with low-privilege local access to an MLflow-integrated training environment inspects the digest_utils.py implementation and identifies the weak hash algorithm in use. They pre-compute a hash collision by crafting a poisoned training dataset — for example, one with subtly mislabeled security-relevant samples or an embedded backdoor trigger — that produces the same digest as the legitimate production dataset. They replace the legitimate dataset in the MLflow artifact store. When a scheduled training run executes, MLflow records the run with a dataset digest that matches historical baselines, so lineage checks and any automated integrity gates pass silently. The resulting model incorporates the poisoned data, potentially introducing a backdoor or degrading performance on specific inputs, while the MLflow audit trail shows no anomaly and post-hoc forensics find a valid-looking digest chain.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:L/A:L References
Timeline
Related Vulnerabilities
CVE-2025-15379 10.0 MLflow: RCE via unsanitized model dependency specs
Same package: mlflow CVE-2023-3765 10.0 MLflow: path traversal allows arbitrary file read
Same package: mlflow CVE-2023-2780 9.8 MLflow: path traversal allows arbitrary file read/write
Same package: mlflow CVE-2026-2635 9.8 mlflow: security flaw enables exploitation
Same package: mlflow CVE-2023-1177 9.8 MLflow: path traversal allows arbitrary file read/write
Same package: mlflow