CVE-2025-32375: BentoML RCE — CRITICAL

Q: Is CVE-2025-32375 actively exploited?

A weaponized Metasploit module (exploit/linux/http/bentoml_runner_server_rce_cve_2025_32375) exists for CVE-2025-32375, meaning the exploit is point-and-click and the risk of opportunistic exploitation is high.

Q: How to fix CVE-2025-32375?

1. PATCH: Upgrade BentoML to 1.4.8 immediately — this is the only complete fix. 2. ISOLATE: If patching is not immediately possible, restrict runner server endpoints to localhost or trusted internal subnets only via firewall/network policy. Do not expose runner ports externally under any circumstances. 3. DETECT: Review HTTP access logs for POST requests to runner endpoints with unusual or binary Content-Type headers, particularly pickle/msgpack types. Alert on unexpected outbound connections from runner processes (reverse shell indicators). 4. AUDIT: Identify all BentoML deployments in your environment — check container images, Kubernetes manifests, and cloud deployments. Treat all pre-1.4.8 instances as potentially compromised if they were network-accessible. 5. ROTATE: If compromise is suspected, rotate all credentials accessible from runner hosts (API keys, DB passwords, cloud IAM tokens).

Q: What systems are affected by CVE-2025-32375?

This vulnerability affects the following AI/ML architecture patterns: model serving, inference endpoints, MLOps pipelines, microservice inference architectures, containerized model deployments.

Q: What is the CVSS score for CVE-2025-32375?

CVE-2025-32375 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 43.81%.

CISO Take

Any BentoML runner server exposed to the network is fully compromised by an unauthenticated attacker with a crafted POST request — no credentials, no user interaction required. Patch to 1.4.8 immediately or take runner endpoints offline until you can. With an EPSS of 0.67, weaponized exploits are either available or imminent.

What is the risk?

Extreme. CVSS 9.8 with AV:N/AC:L/PR:N/UI:N means this is a drive-by RCE requiring zero attacker skill beyond sending an HTTP request. EPSS of 0.67 places this in the top 5% of actively-exploited vulnerabilities. BentoML is a production-grade model serving framework used in real-time inference pipelines — compromise gives an attacker persistent foothold inside the ML inference layer, with access to models, API keys, training artifacts, and downstream data pipelines.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
BentoML	pip	—	No patch
8.7K OpenSSF 6.4 21 dependents Pushed 21d ago 55% patched ~14d to patch Full package profile →
BentoML	pip	>= 1.0.0a1, < 1.4.8	`1.4.8`
8.7K OpenSSF 6.4 21 dependents Pushed 21d ago 55% patched ~14d to patch Full package profile →

How severe is it?

CVSS 3.1

9.8 / 10

EPSS

43.8%

chance of exploitation in 30 days

Higher than 99% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ CISA SSVC: Public PoC

✓ Weaponized Metasploit module (exploit/linux/http/bentoml_runner_server_rce_cve_2025_32375)

○ EPSS exploit prediction: 44%

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Network

AC Low

PR None

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

PATCH

Upgrade BentoML to 1.4.8 immediately — this is the only complete fix.
ISOLATE

If patching is not immediately possible, restrict runner server endpoints to localhost or trusted internal subnets only via firewall/network policy. Do not expose runner ports externally under any circumstances.
DETECT

Review HTTP access logs for POST requests to runner endpoints with unusual or binary Content-Type headers, particularly pickle/msgpack types. Alert on unexpected outbound connections from runner processes (reverse shell indicators).
AUDIT

Identify all BentoML deployments in your environment — check container images, Kubernetes manifests, and cloud deployments. Treat all pre-1.4.8 instances as potentially compromised if they were network-accessible.
ROTATE

If compromise is suspected, rotate all credentials accessible from runner hosts (API keys, DB passwords, cloud IAM tokens).

What does CISA's SSVC say?

Decision Attend

Exploitation poc

Automatable Yes

Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Code Execution Data Extraction Supply Chain Framework Inference AML.T0010.001 - AI Software AML.T0025 - Exfiltration via Cyber Means AML.T0049 - Exploit Public-Facing Application AML.T0050 - Command and Scripting Interpreter AML.T0072 - Reverse Shell

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

8.4 - AI system operation and monitoring

NIST AI RMF

GV-1.7 - Organizational risk policies for AI MS-2.5 - AI system monitoring and incident response

OWASP LLM Top 10

LLM09:2025 - Misinformation and Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2025-32375?

Any BentoML runner server exposed to the network is fully compromised by an unauthenticated attacker with a crafted POST request — no credentials, no user interaction required. Patch to 1.4.8 immediately or take runner endpoints offline until you can. With an EPSS of 0.67, weaponized exploits are either available or imminent.

Is CVE-2025-32375 actively exploited?

A weaponized Metasploit module (exploit/linux/http/bentoml_runner_server_rce_cve_2025_32375) exists for CVE-2025-32375, meaning the exploit is point-and-click and the risk of opportunistic exploitation is high.

How to fix CVE-2025-32375?

1. PATCH: Upgrade BentoML to 1.4.8 immediately — this is the only complete fix. 2. ISOLATE: If patching is not immediately possible, restrict runner server endpoints to localhost or trusted internal subnets only via firewall/network policy. Do not expose runner ports externally under any circumstances. 3. DETECT: Review HTTP access logs for POST requests to runner endpoints with unusual or binary Content-Type headers, particularly pickle/msgpack types. Alert on unexpected outbound connections from runner processes (reverse shell indicators). 4. AUDIT: Identify all BentoML deployments in your environment — check container images, Kubernetes manifests, and cloud deployments. Treat all pre-1.4.8 instances as potentially compromised if they were network-accessible. 5. ROTATE: If compromise is suspected, rotate all credentials accessible from runner hosts (API keys, DB passwords, cloud IAM tokens).

What systems are affected by CVE-2025-32375?

This vulnerability affects the following AI/ML architecture patterns: model serving, inference endpoints, MLOps pipelines, microservice inference architectures, containerized model deployments.

What is the CVSS score for CVE-2025-32375?

CVE-2025-32375 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 43.81%.

What is the AI security impact?

Affected AI Architectures

model servinginference endpointsMLOps pipelinesmicroservice inference architecturescontainerized model deployments

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0025 Exfiltration via Cyber Means

AML.T0049 Exploit Public-Facing Application

AML.T0050 Command and Scripting Interpreter

AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: 8.4

NIST AI RMF: GV-1.7, MS-2.5

OWASP LLM Top 10: LLM09:2025

What are the technical details?

Original Advisory

BentoML is a Python library for building online serving systems optimized for AI apps and model inference. Prior to 1.4.8, there was an insecure deserialization in BentoML's runner server. By setting specific headers and parameters in the POST request, it is possible to execute any unauthorized arbitrary code on the server, which will grant the attackers to have the initial access and information disclosure on the server. This vulnerability is fixed in 1.4.8.

Exploitation Scenario

An attacker scans for BentoML runner server endpoints (default port 8080/8081) using Shodan or internal network reconnaissance. They craft a POST request to a runner inference endpoint with a Content-Type header indicating pickle serialization and a malicious payload that executes arbitrary shell commands upon deserialization. The runner server deserializes the payload without validation, executing the attacker's code as the runner process user. From here, the attacker drops a reverse shell, exfiltrates model weights and environment variables (cloud credentials, API keys), and establishes persistence. In a Kubernetes environment, the service account token is harvested for cluster-wide lateral movement.

Weaknesses (CWE)

CWE-502 Deserialization of Untrusted Data Primary CWE-502 Deserialization of Untrusted Data

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

[Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
[Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.