CVE-2025-0453: MLflow: GraphQL DoS disables ML tracking server

GHSA-49m6-vrr9-2cqm HIGH PoC AVAILABLE CISA: TRACK*
Published March 20, 2025
CISO Take

MLflow's GraphQL endpoint allows unauthenticated attackers to exhaust all server workers via batched query flooding, taking down your entire ML experiment tracking and model registry. If MLflow is accessible beyond your internal network perimeter, treat this as high priority. Immediately restrict network access to the /graphql endpoint and audit firewall rules for MLflow deployments.

What is the risk?

CVSS 7.5 High with zero authentication required and low attack complexity makes this trivially exploitable by any network-adjacent attacker. EPSS is low (0.136%) suggesting no observed mass exploitation yet, but the exploit mechanism is simple enough that proof-of-concept code on huntr.com lowers the practical bar significantly. Enterprise risk depends entirely on exposure: internal-only MLflow deployments are lower risk, but MLflow instances exposed via shared cloud environments, Kubernetes ingress without auth proxy, or developer shortcuts are genuinely vulnerable. No patch is available as of publication date.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
MLflow pip No patch
26.6K OpenSSF 5.6 655 dependents Pushed 5d ago 31% patched ~51d to patch Full package profile →
MLflow pip <= 2.17.2 No patch
26.6K OpenSSF 5.6 655 dependents Pushed 5d ago 31% patched ~51d to patch Full package profile →

How severe is it?

CVSS 3.1
7.5 / 10
EPSS
0.5%
chance of exploitation in 30 days
Higher than 40% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

What should I do?

6 steps
  1. IMMEDIATE

    Restrict /graphql endpoint access at the network layer—MLflow should never be directly internet-exposed; enforce this via firewall rules or reverse proxy ACLs.

  2. SHORT-TERM: Deploy an API gateway or WAF rule to rate-limit requests to /graphql per source IP.

  3. WORKAROUND

    If MLflow must be accessible, add an authenticated reverse proxy (nginx/Traefik with basic auth or SSO) in front of all MLflow endpoints.

  4. DETECTION

    Monitor for abnormal spikes in /graphql request volume or worker saturation in MLflow metrics; alert on CPU/thread exhaustion.

  5. PATCH

    No fixed version confirmed—monitor mlflow GitHub releases and huntr advisory for patch availability; upgrade immediately when released.

  6. AUDIT

    Inventory all MLflow instances across your environments, including shadow deployments by data science teams.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art.9 - Risk management system
ISO 42001
A.10.2 - AI system operational continuity
NIST AI RMF
MS-2.5 - AI system availability and resilience monitoring
OWASP LLM Top 10
LLM10 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2025-0453?

MLflow's GraphQL endpoint allows unauthenticated attackers to exhaust all server workers via batched query flooding, taking down your entire ML experiment tracking and model registry. If MLflow is accessible beyond your internal network perimeter, treat this as high priority. Immediately restrict network access to the /graphql endpoint and audit firewall rules for MLflow deployments.

Is CVE-2025-0453 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-0453, increasing the risk of exploitation.

How to fix CVE-2025-0453?

1. IMMEDIATE: Restrict /graphql endpoint access at the network layer—MLflow should never be directly internet-exposed; enforce this via firewall rules or reverse proxy ACLs. 2. SHORT-TERM: Deploy an API gateway or WAF rule to rate-limit requests to /graphql per source IP. 3. WORKAROUND: If MLflow must be accessible, add an authenticated reverse proxy (nginx/Traefik with basic auth or SSO) in front of all MLflow endpoints. 4. DETECTION: Monitor for abnormal spikes in /graphql request volume or worker saturation in MLflow metrics; alert on CPU/thread exhaustion. 5. PATCH: No fixed version confirmed—monitor mlflow GitHub releases and huntr advisory for patch availability; upgrade immediately when released. 6. AUDIT: Inventory all MLflow instances across your environments, including shadow deployments by data science teams.

What systems are affected by CVE-2025-0453?

This vulnerability affects the following AI/ML architecture patterns: ML experiment tracking, model registry, MLOps pipelines, training pipelines, CI/CD for ML.

What is the CVSS score for CVE-2025-0453?

CVE-2025-0453 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.52%.

What is the AI security impact?

Affected AI Architectures

ML experiment trackingmodel registryMLOps pipelinestraining pipelinesCI/CD for ML

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.9
ISO 42001: A.10.2
NIST AI RMF: MS-2.5
OWASP LLM Top 10: LLM10

What are the technical details?

Original Advisory

In mlflow/mlflow version 2.17.2, the `/graphql` endpoint is vulnerable to a denial of service attack. An attacker can create large batches of queries that repeatedly request all runs from a given experiment. This can tie up all the workers allocated by MLFlow, rendering the application unable to respond to other requests. This vulnerability is due to uncontrolled resource consumption.

Exploitation Scenario

An attacker discovers an exposed MLflow tracking server (common in cloud environments with permissive security groups or Kubernetes LoadBalancer services). They send a burst of GraphQL batch queries to /api/2.0/graphql, each requesting all runs across large experiments. MLflow's worker pool—typically 4-8 Gunicorn workers—becomes fully occupied processing these expensive database queries. Within seconds, legitimate requests queue indefinitely and time out. The attacker sustains the attack with minimal bandwidth, maintaining a small continuous stream of batch queries. Data scientists cannot log new experiments or access the model registry; automated training pipelines fail; on-call engineers scramble to diagnose what appears to be an infrastructure issue rather than an attack.

Weaknesses (CWE)

CWE-400 — Uncontrolled Resource Consumption: The product does not properly control the allocation and maintenance of a limited resource.

  • [Architecture and Design] Design throttling mechanisms into the system architecture. The best protection is to limit the amount of resources that an unauthorized user can cause to be expended. A strong authentication and access control model will help prevent such attacks from occurring in the first place. The login application should be protected against DoS attacks as much as possible. Limiting the database access, perhaps by caching result sets, can help minimize the resources expended. To further limit the potential for a DoS attack, consider tracking the rate of requests received from users and blocking requests that exceed a defined rate threshold.
  • [Architecture and Design] Mitigation of resource exhaustion attacks requires that the target system either: The first of these solutions is an issue in itself though, since it may allow attackers to prevent the use of the system by a particular valid user. If the attacker impersonates the valid user, they may be able to prevent the user from accessing the server in question. The second solution is simply difficult to effectively institute -- and even when properly done, it does not provide a full solution. It simply makes the attack require more resources on the part of the attacker. recognizes the attack and denies that user further access for a given amount of time, or uniformly throttles all requests in order to make it more difficult to consume resources more quickly than they can again be freed.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
March 20, 2025
Last Modified
October 15, 2025
First Seen
March 20, 2025

Related Vulnerabilities