CVE-2024-9056: BentoML: DoS via multipart boundary exhausts server
GHSA-hw8j-hw49-752c HIGH CISA: TRACK*BentoML model serving endpoints are vulnerable to unauthenticated DoS via crafted HTTP multipart requests — no patch exists for versions <= 1.4.5. Any internet-exposed BentoML deployment risks complete inference service unavailability from a single attacker with no credentials. Immediately place a WAF or reverse proxy with multipart boundary length limits and rate limiting in front of all BentoML endpoints until upstream patches.
Risk Assessment
High severity (CVSS 7.5) with low EPSS (0.00151), indicating limited active exploitation evidence. However, the zero-authentication, zero-interaction attack vector makes this trivially weaponizable once a target is identified. Absence of an available patch elevates operational risk. Organizations exposing BentoML inference APIs directly to the internet or running multi-tenant AI serving platforms face the highest exposure surface.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| bentoml | pip | <= 1.4.5 | No patch |
Do you use bentoml? You're affected.
Severity & Risk
Attack Surface
Recommended Action
6 steps-
Inventory all BentoML deployments: 'pip show bentoml' on all serving hosts.
-
Place a reverse proxy (nginx/Caddy) or WAF in front of BentoML endpoints with strict multipart boundary length limits.
-
Enforce rate limiting and request size caps (e.g., client_max_body_size in nginx, body size limits in API gateway).
-
Restrict inference endpoint access via IP allowlisting where feasible.
-
Monitor GitHub advisory GHSA-hw8j-hw49-752c for upstream patch release.
-
Alert on anomalous CPU/memory spikes on BentoML inference servers as an indicator of active exploitation.
CISA SSVC Assessment
Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2024-9056?
BentoML model serving endpoints are vulnerable to unauthenticated DoS via crafted HTTP multipart requests — no patch exists for versions <= 1.4.5. Any internet-exposed BentoML deployment risks complete inference service unavailability from a single attacker with no credentials. Immediately place a WAF or reverse proxy with multipart boundary length limits and rate limiting in front of all BentoML endpoints until upstream patches.
Is CVE-2024-9056 actively exploited?
No confirmed active exploitation of CVE-2024-9056 has been reported, but organizations should still patch proactively.
How to fix CVE-2024-9056?
1. Inventory all BentoML deployments: 'pip show bentoml' on all serving hosts. 2. Place a reverse proxy (nginx/Caddy) or WAF in front of BentoML endpoints with strict multipart boundary length limits. 3. Enforce rate limiting and request size caps (e.g., client_max_body_size in nginx, body size limits in API gateway). 4. Restrict inference endpoint access via IP allowlisting where feasible. 5. Monitor GitHub advisory GHSA-hw8j-hw49-752c for upstream patch release. 6. Alert on anomalous CPU/memory spikes on BentoML inference servers as an indicator of active exploitation.
What systems are affected by CVE-2024-9056?
This vulnerability affects the following AI/ML architecture patterns: model serving, inference APIs, MLOps pipelines, AI microservices.
What is the CVSS score for CVE-2024-9056?
CVE-2024-9056 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.30%.
Technical Details
NVD Description
BentoML version v1.3.4post1 is vulnerable to a Denial of Service (DoS) attack. The vulnerability can be exploited by appending characters, such as dashes (-), to the end of a multipart boundary in an HTTP request. This causes the server to continuously process each character, leading to excessive resource consumption and rendering the service unavailable. The issue is unauthenticated and does not require any user interaction, impacting all users of the service.
Exploitation Scenario
An adversary enumerates internet-facing BentoML inference APIs (e.g., via Shodan or targeted recon of an organization's AI product). Without credentials or prior access, they craft HTTP multipart POST requests with malformed boundary strings — appending hundreds or thousands of dashes to the boundary value. BentoML's file I/O descriptor processes each character sequentially, consuming CPU in proportion to boundary length. By sending concurrent malformed requests, the attacker exhausts server resources and renders the ML inference service unresponsive, effectively disabling any AI-powered application features relying on it.
Weaknesses (CWE)
CVSS Vector
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H References
- github.com/advisories/GHSA-hw8j-hw49-752c
- github.com/bentoml/BentoML/blob/a6f5f937be6ec278f3d4f3bbc6f3c8f9564820d7/src/bentoml/_internal/io_descriptors/file.py
- github.com/bentoml/BentoML/blob/v1.4.5/src/bentoml/_internal/io_descriptors/file.py
- nvd.nist.gov/vuln/detail/CVE-2024-9056
- huntr.com/bounties/a24a13c2-0300-4a95-b26a-ac7fe8f6521b
Timeline
Related Vulnerabilities
CVE-2025-54381 9.9 BentoML: unauthenticated SSRF via file upload URLs
Same package: bentoml CVE-2025-32375 9.8 BentoML: RCE via insecure deserialization in runner
Same package: bentoml CVE-2024-9070 9.8 BentoML: unauthenticated RCE via runner deserialization
Same package: bentoml CVE-2025-27520 9.8 BentoML: unauthenticated RCE via insecure deserialization
Same package: bentoml CVE-2026-35044 8.8 BentoML: malicious bento archive RCE via Jinja2 SSTI
Same package: bentoml
AI Threat Alert