CVE-2025-32375: BentoML: RCE via insecure deserialization in runner

GHSA-7v4r-c989-xh26 CRITICAL PoC AVAILABLE CISA: ATTEND
Published April 9, 2025
CISO Take

Any BentoML runner server exposed to the network is fully compromised by an unauthenticated attacker with a crafted POST request — no credentials, no user interaction required. Patch to 1.4.8 immediately or take runner endpoints offline until you can. With an EPSS of 0.67, weaponized exploits are either available or imminent.

Risk Assessment

Extreme. CVSS 9.8 with AV:N/AC:L/PR:N/UI:N means this is a drive-by RCE requiring zero attacker skill beyond sending an HTTP request. EPSS of 0.67 places this in the top 5% of actively-exploited vulnerabilities. BentoML is a production-grade model serving framework used in real-time inference pipelines — compromise gives an attacker persistent foothold inside the ML inference layer, with access to models, API keys, training artifacts, and downstream data pipelines.

Affected Systems

Package Ecosystem Vulnerable Range Patched
bentoml pip No patch
8.6K OpenSSF 6.3 22 dependents Pushed 2d ago 50% patched ~14d to patch Full package profile →
bentoml pip >= 1.0.0a1, < 1.4.8 1.4.8
8.6K OpenSSF 6.3 22 dependents Pushed 2d ago 50% patched ~14d to patch Full package profile →

Severity & Risk

CVSS 3.1
9.8 / 10
EPSS
67.3%
chance of exploitation in 30 days
Higher than 99% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
EPSS exploit prediction: 67%
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. PATCH

    Upgrade BentoML to 1.4.8 immediately — this is the only complete fix.

  2. ISOLATE

    If patching is not immediately possible, restrict runner server endpoints to localhost or trusted internal subnets only via firewall/network policy. Do not expose runner ports externally under any circumstances.

  3. DETECT

    Review HTTP access logs for POST requests to runner endpoints with unusual or binary Content-Type headers, particularly pickle/msgpack types. Alert on unexpected outbound connections from runner processes (reverse shell indicators).

  4. AUDIT

    Identify all BentoML deployments in your environment — check container images, Kubernetes manifests, and cloud deployments. Treat all pre-1.4.8 instances as potentially compromised if they were network-accessible.

  5. ROTATE

    If compromise is suspected, rotate all credentials accessible from runner hosts (API keys, DB passwords, cloud IAM tokens).

CISA SSVC Assessment

Decision Attend
Exploitation poc
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
8.4 - AI system operation and monitoring
NIST AI RMF
GV-1.7 - Organizational risk policies for AI MS-2.5 - AI system monitoring and incident response
OWASP LLM Top 10
LLM09:2025 - Misinformation and Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-32375?

Any BentoML runner server exposed to the network is fully compromised by an unauthenticated attacker with a crafted POST request — no credentials, no user interaction required. Patch to 1.4.8 immediately or take runner endpoints offline until you can. With an EPSS of 0.67, weaponized exploits are either available or imminent.

Is CVE-2025-32375 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-32375, increasing the risk of exploitation.

How to fix CVE-2025-32375?

1. PATCH: Upgrade BentoML to 1.4.8 immediately — this is the only complete fix. 2. ISOLATE: If patching is not immediately possible, restrict runner server endpoints to localhost or trusted internal subnets only via firewall/network policy. Do not expose runner ports externally under any circumstances. 3. DETECT: Review HTTP access logs for POST requests to runner endpoints with unusual or binary Content-Type headers, particularly pickle/msgpack types. Alert on unexpected outbound connections from runner processes (reverse shell indicators). 4. AUDIT: Identify all BentoML deployments in your environment — check container images, Kubernetes manifests, and cloud deployments. Treat all pre-1.4.8 instances as potentially compromised if they were network-accessible. 5. ROTATE: If compromise is suspected, rotate all credentials accessible from runner hosts (API keys, DB passwords, cloud IAM tokens).

What systems are affected by CVE-2025-32375?

This vulnerability affects the following AI/ML architecture patterns: model serving, inference endpoints, MLOps pipelines, microservice inference architectures, containerized model deployments.

What is the CVSS score for CVE-2025-32375?

CVE-2025-32375 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 67.34%.

Technical Details

NVD Description

BentoML is a Python library for building online serving systems optimized for AI apps and model inference. Prior to 1.4.8, there was an insecure deserialization in BentoML's runner server. By setting specific headers and parameters in the POST request, it is possible to execute any unauthorized arbitrary code on the server, which will grant the attackers to have the initial access and information disclosure on the server. This vulnerability is fixed in 1.4.8.

Exploitation Scenario

An attacker scans for BentoML runner server endpoints (default port 8080/8081) using Shodan or internal network reconnaissance. They craft a POST request to a runner inference endpoint with a Content-Type header indicating pickle serialization and a malicious payload that executes arbitrary shell commands upon deserialization. The runner server deserializes the payload without validation, executing the attacker's code as the runner process user. From here, the attacker drops a reverse shell, exfiltrates model weights and environment variables (cloud credentials, API keys), and establishes persistence. In a Kubernetes environment, the service account token is harvested for cluster-wide lateral movement.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
April 9, 2025
Last Modified
April 23, 2025
First Seen
April 9, 2025

Related Vulnerabilities