GHSA-hh3j-9m59-p8vc: BentoML DoS — HIGH

Q: Is GHSA-hh3j-9m59-p8vc actively exploited?

No confirmed active exploitation of GHSA-hh3j-9m59-p8vc has been reported, but organizations should still patch proactively.

Q: How to fix GHSA-hh3j-9m59-p8vc?

1. IMMEDIATE: Block or rate-limit HTTP requests to /login and all multipart endpoints at the WAF/reverse proxy layer (Nginx, Cloudflare, AWS WAF). Restrict access to trusted IP ranges. 2. NETWORK: If Gradio UI is not required externally, bind BentoML to localhost or internal VPC only — do not expose publicly. 3. MONITORING: Alert on abnormal CPU/memory spikes on BentoML processes; set up health check endpoints with auto-restart on failure. 4. PATCH: Monitor https://github.com/bentoml/BentoML/releases for a fix; currently no patch is available. Pin to a patched version immediately once released. 5. WORKAROUND: Disable Gradio integration entirely if not actively used by setting `gradio_enabled=false` in BentoML configuration.

Q: What systems are affected by GHSA-hh3j-9m59-p8vc?

This vulnerability affects the following AI/ML architecture patterns: model serving, inference APIs, MLOps platforms.

Q: What is the CVSS score for GHSA-hh3j-9m59-p8vc?

GHSA-hh3j-9m59-p8vc has a CVSS v3.1 base score of 7.5 (HIGH).

CISO Take

BentoML 1.3.9's Gradio integration exposes an unauthenticated DoS vector on the /login endpoint — no credentials, no user interaction required to take down inference services. No official patch exists yet; immediately restrict public network access to BentoML instances and apply WAF rate-limiting on multipart requests. Any ML model serving infrastructure exposed to untrusted networks is at risk of complete availability loss.

What is the risk?

HIGH. CVSS 7.5 with AV:N/AC:L/PR:N/UI:N means this is trivially exploitable by any unauthenticated attacker over the internet. The lack of a patch elevates operational risk significantly. BentoML is widely used in production ML serving pipelines, and the Gradio integration is increasingly common for rapid model deployment. Exposure surface is broad: any internet-facing BentoML 1.3.9 deployment with Gradio enabled is fully vulnerable until network-level controls are applied.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
BentoML	pip	<= 1.3.9	No patch
8.7K OpenSSF 6.4 21 dependents Pushed 21d ago 55% patched ~14d to patch Full package profile →

Do you use BentoML? You're affected.

How severe is it?

CVSS 3.1

7.5 / 10

EPSS

N/A

Exploitation Status

No known exploitation

Sophistication

Trivial

What is the attack surface?

AV Network

AC Low

PR None

UI None

S Unchanged

C None

I None

A High

What should I do?

5 steps

IMMEDIATE

Block or rate-limit HTTP requests to /login and all multipart endpoints at the WAF/reverse proxy layer (Nginx, Cloudflare, AWS WAF). Restrict access to trusted IP ranges.
NETWORK

If Gradio UI is not required externally, bind BentoML to localhost or internal VPC only — do not expose publicly.
MONITORING

Alert on abnormal CPU/memory spikes on BentoML processes; set up health check endpoints with auto-restart on failure.
PATCH

Monitor https://github.com/bentoml/BentoML/releases for a fix; currently no patch is available. Pin to a patched version immediately once released.
WORKAROUND

Disable Gradio integration entirely if not actively used by setting gradio_enabled=false in BentoML configuration.

How is it classified?

DoS Framework Inference AML.T0029 - Denial of AI Service AML.T0034 - Cost Harvesting AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.5 - AI system availability and resilience

NIST AI RMF

MANAGE 2.2 - Mechanisms to sustain AI system operation MAP 5.1 - Likelihood and magnitude of AI risks

OWASP LLM Top 10

LLM04 - Model Denial of Service

Frequently Asked Questions

What is GHSA-hh3j-9m59-p8vc?

BentoML 1.3.9's Gradio integration exposes an unauthenticated DoS vector on the /login endpoint — no credentials, no user interaction required to take down inference services. No official patch exists yet; immediately restrict public network access to BentoML instances and apply WAF rate-limiting on multipart requests. Any ML model serving infrastructure exposed to untrusted networks is at risk of complete availability loss.

Is GHSA-hh3j-9m59-p8vc actively exploited?

No confirmed active exploitation of GHSA-hh3j-9m59-p8vc has been reported, but organizations should still patch proactively.

How to fix GHSA-hh3j-9m59-p8vc?

1. IMMEDIATE: Block or rate-limit HTTP requests to /login and all multipart endpoints at the WAF/reverse proxy layer (Nginx, Cloudflare, AWS WAF). Restrict access to trusted IP ranges. 2. NETWORK: If Gradio UI is not required externally, bind BentoML to localhost or internal VPC only — do not expose publicly. 3. MONITORING: Alert on abnormal CPU/memory spikes on BentoML processes; set up health check endpoints with auto-restart on failure. 4. PATCH: Monitor https://github.com/bentoml/BentoML/releases for a fix; currently no patch is available. Pin to a patched version immediately once released. 5. WORKAROUND: Disable Gradio integration entirely if not actively used by setting `gradio_enabled=false` in BentoML configuration.

What systems are affected by GHSA-hh3j-9m59-p8vc?

This vulnerability affects the following AI/ML architecture patterns: model serving, inference APIs, MLOps platforms.

What is the CVSS score for GHSA-hh3j-9m59-p8vc?

GHSA-hh3j-9m59-p8vc has a CVSS v3.1 base score of 7.5 (HIGH).

What is the AI security impact?

Affected AI Architectures

model servinginference APIsMLOps platforms

MITRE ATLAS Techniques

AML.T0029 Denial of AI Service

AML.T0034 Cost Harvesting

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: A.6.2.5

NIST AI RMF: MANAGE 2.2, MAP 5.1

OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

In bentoml/bentoml version 1.3.9, the `/login` endpoint of the newly integrated Gradio app is vulnerable to a Denial of Service (DoS) attack. This vulnerability can be exploited by appending characters, such as dashes (-), to the end of a multipart boundary in an HTTP request. The server continuously processes each character, leading to excessive resource consumption and rendering the service unavailable. The issue is unauthenticated and does not require any user interaction.

Exploitation Scenario

An attacker identifies a public-facing BentoML 1.3.9 endpoint (e.g., via Shodan, GitHub repository leaks, or DNS enumeration). They send a crafted HTTP POST to /login with a Content-Type header containing a multipart boundary followed by hundreds of appended dash characters (e.g., `Content-Type: multipart/form-data; boundary=----WebKitFormBoundary----------...`). The server enters a processing loop for each appended character, consuming increasing CPU and memory. With repeated requests — trivially automated with curl or a basic Python script — the server exhausts resources and crashes or becomes unresponsive. No credentials, tokens, or prior knowledge of the system are required. The attack takes seconds to execute and requires no AI/ML domain expertise.

Weaknesses (CWE)

CWE-400 Uncontrolled Resource Consumption Primary

CWE-400 — Uncontrolled Resource Consumption: The product does not properly control the allocation and maintenance of a limited resource.

[Architecture and Design] Design throttling mechanisms into the system architecture. The best protection is to limit the amount of resources that an unauthorized user can cause to be expended. A strong authentication and access control model will help prevent such attacks from occurring in the first place. The login application should be protected against DoS attacks as much as possible. Limiting the database access, perhaps by caching result sets, can help minimize the resources expended. To further limit the potential for a DoS attack, consider tracking the rate of requests received from users and blocking requests that exceed a defined rate threshold.
[Architecture and Design] Mitigation of resource exhaustion attacks requires that the target system either: The first of these solutions is an issue in itself though, since it may allow attackers to prevent the use of the system by a particular valid user. If the attacker impersonates the valid user, they may be able to prevent the user from accessing the server in question. The second solution is simply difficult to effectively institute -- and even when properly done, it does not provide a full solution. It simply makes the attack require more resources on the part of the attacker. recognizes the attack and denies that user further access for a given amount of time, or uniformly throttles all requests in order to make it more difficult to consume resources more quickly than they can again be freed.

Source: MITRE CWE corpus.