CVE-2024-9056: BentoML: DoS via multipart boundary exhausts server

GHSA-hw8j-hw49-752c HIGH CISA: TRACK*
Published March 20, 2025
CISO Take

BentoML model serving endpoints are vulnerable to unauthenticated DoS via crafted HTTP multipart requests — no patch exists for versions <= 1.4.5. Any internet-exposed BentoML deployment risks complete inference service unavailability from a single attacker with no credentials. Immediately place a WAF or reverse proxy with multipart boundary length limits and rate limiting in front of all BentoML endpoints until upstream patches.

What is the risk?

High severity (CVSS 7.5) with low EPSS (0.00151), indicating limited active exploitation evidence. However, the zero-authentication, zero-interaction attack vector makes this trivially weaponizable once a target is identified. Absence of an available patch elevates operational risk. Organizations exposing BentoML inference APIs directly to the internet or running multi-tenant AI serving platforms face the highest exposure surface.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
BentoML pip <= 1.4.5 No patch
8.7K OpenSSF 6.4 21 dependents Pushed 21d ago 55% patched ~14d to patch Full package profile →

Do you use BentoML? You're affected.

How severe is it?

CVSS 3.1
7.5 / 10
EPSS
0.7%
chance of exploitation in 30 days
Higher than 47% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

What should I do?

6 steps
  1. Inventory all BentoML deployments: 'pip show bentoml' on all serving hosts.

  2. Place a reverse proxy (nginx/Caddy) or WAF in front of BentoML endpoints with strict multipart boundary length limits.

  3. Enforce rate limiting and request size caps (e.g., client_max_body_size in nginx, body size limits in API gateway).

  4. Restrict inference endpoint access via IP allowlisting where feasible.

  5. Monitor GitHub advisory GHSA-hw8j-hw49-752c for upstream patch release.

  6. Alert on anomalous CPU/memory spikes on BentoML inference servers as an indicator of active exploitation.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable Yes
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art. 15 - Accuracy, Robustness and Cybersecurity
ISO 42001
A.10.3 - AI System Operation and Monitoring
NIST AI RMF
MANAGE-2.4 - AI System Resilience and Incident Response
OWASP LLM Top 10
LLM10 - Unbounded Consumption

Frequently Asked Questions

What is CVE-2024-9056?

BentoML model serving endpoints are vulnerable to unauthenticated DoS via crafted HTTP multipart requests — no patch exists for versions <= 1.4.5. Any internet-exposed BentoML deployment risks complete inference service unavailability from a single attacker with no credentials. Immediately place a WAF or reverse proxy with multipart boundary length limits and rate limiting in front of all BentoML endpoints until upstream patches.

Is CVE-2024-9056 actively exploited?

No confirmed active exploitation of CVE-2024-9056 has been reported, but organizations should still patch proactively.

How to fix CVE-2024-9056?

1. Inventory all BentoML deployments: 'pip show bentoml' on all serving hosts. 2. Place a reverse proxy (nginx/Caddy) or WAF in front of BentoML endpoints with strict multipart boundary length limits. 3. Enforce rate limiting and request size caps (e.g., client_max_body_size in nginx, body size limits in API gateway). 4. Restrict inference endpoint access via IP allowlisting where feasible. 5. Monitor GitHub advisory GHSA-hw8j-hw49-752c for upstream patch release. 6. Alert on anomalous CPU/memory spikes on BentoML inference servers as an indicator of active exploitation.

What systems are affected by CVE-2024-9056?

This vulnerability affects the following AI/ML architecture patterns: model serving, inference APIs, MLOps pipelines, AI microservices.

What is the CVSS score for CVE-2024-9056?

CVE-2024-9056 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.66%.

What is the AI security impact?

Affected AI Architectures

model servinginference APIsMLOps pipelinesAI microservices

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service
AML.T0034 Cost Harvesting
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15
ISO 42001: A.10.3
NIST AI RMF: MANAGE-2.4
OWASP LLM Top 10: LLM10

What are the technical details?

Original Advisory

BentoML version v1.3.4post1 is vulnerable to a Denial of Service (DoS) attack. The vulnerability can be exploited by appending characters, such as dashes (-), to the end of a multipart boundary in an HTTP request. This causes the server to continuously process each character, leading to excessive resource consumption and rendering the service unavailable. The issue is unauthenticated and does not require any user interaction, impacting all users of the service.

Exploitation Scenario

An adversary enumerates internet-facing BentoML inference APIs (e.g., via Shodan or targeted recon of an organization's AI product). Without credentials or prior access, they craft HTTP multipart POST requests with malformed boundary strings — appending hundreds or thousands of dashes to the boundary value. BentoML's file I/O descriptor processes each character sequentially, consuming CPU in proportion to boundary length. By sending concurrent malformed requests, the attacker exhausts server resources and renders the ML inference service unresponsive, effectively disabling any AI-powered application features relying on it.

Weaknesses (CWE)

CWE-400 — Uncontrolled Resource Consumption: The product does not properly control the allocation and maintenance of a limited resource.

  • [Architecture and Design] Design throttling mechanisms into the system architecture. The best protection is to limit the amount of resources that an unauthorized user can cause to be expended. A strong authentication and access control model will help prevent such attacks from occurring in the first place. The login application should be protected against DoS attacks as much as possible. Limiting the database access, perhaps by caching result sets, can help minimize the resources expended. To further limit the potential for a DoS attack, consider tracking the rate of requests received from users and blocking requests that exceed a defined rate threshold.
  • [Architecture and Design] Mitigation of resource exhaustion attacks requires that the target system either: The first of these solutions is an issue in itself though, since it may allow attackers to prevent the use of the system by a particular valid user. If the attacker impersonates the valid user, they may be able to prevent the user from accessing the server in question. The second solution is simply difficult to effectively institute -- and even when properly done, it does not provide a full solution. It simply makes the attack require more resources on the part of the attacker. recognizes the attack and denies that user further access for a given amount of time, or uniformly throttles all requests in order to make it more difficult to consume resources more quickly than they can again be freed.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
March 20, 2025
Last Modified
October 15, 2025
First Seen
March 20, 2025

Related Vulnerabilities