CVE-2024-9070: BentoML: unauthenticated RCE via runner deserialization

GHSA-9g44-gwvm-hc44 CRITICAL CISA: ATTEND
Published March 20, 2025
CISO Take

BentoML's runner server deserializes untrusted data without validation when the args-number parameter exceeds 1, enabling unauthenticated remote code execution by any network-accessible attacker. Any ML serving infrastructure running BentoML ≤1.3.4.post1 is at critical risk of full server compromise, including exfiltration of model weights, credentials, and training data. Immediately isolate runner server endpoints from untrusted networks and prioritize patching to a confirmed fixed version.

What is the risk?

CVSS 9.8 with a fully unauthenticated, network-accessible, zero-interaction attack path makes this trivially exploitable by any actor with network reachability to the runner server. While EPSS (0.25%) indicates no confirmed active exploitation at time of publication, the simplicity of the trigger—a single parameter manipulation with a crafted payload—means weaponization is low-effort and public PoC development is likely imminent. ML inference infrastructure is a high-value target for credential and IP theft, and runner servers are frequently misconfigured as internally reachable from broader cloud or Kubernetes environments.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
BentoML pip <= 1.4.5 No patch
8.7K OpenSSF 6.4 21 dependents Pushed 20d ago 55% patched ~14d to patch Full package profile →

Do you use BentoML? You're affected.

How severe is it?

CVSS 3.1
9.8 / 10
EPSS
0.8%
chance of exploitation in 30 days
Higher than 53% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

What should I do?

5 steps
  1. IMMEDIATE

    Firewall or network-policy restrict all BentoML runner server ports to trusted internal services only—do not expose to public internet or untrusted segments.

  2. PATCH

    Upgrade BentoML to the latest available release; verify fix inclusion by reviewing runner_app.py changes in the v1.4.5+ GitHub history. Confirm patched: N/A in current data—check https://github.com/bentoml/BentoML/releases for official guidance.

  3. DETECT

    Review HTTP access logs on runner servers for requests with args-number > 1 originating from unexpected sources; alert on anomalous child process spawning from runner processes.

  4. AUDIT

    Enumerate all BentoML deployments in your environment including Kubernetes pods, Docker Compose stacks, and cloud VMs; check network exposure for each.

  5. COMPENSATING CONTROL

    If patching is blocked, deploy a reverse proxy or API gateway in front of runner endpoints with strict request validation rejecting args-number > 1.

What does CISA's SSVC say?

Decision Attend
Exploitation poc
Automatable Yes
Technical Impact total

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.1.2 - AI risk assessment A.9.3 - AI system security
NIST AI RMF
MANAGE 2.2 - Treatments, responses, and recovery actions are taken
OWASP LLM Top 10
LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2024-9070?

BentoML's runner server deserializes untrusted data without validation when the args-number parameter exceeds 1, enabling unauthenticated remote code execution by any network-accessible attacker. Any ML serving infrastructure running BentoML ≤1.3.4.post1 is at critical risk of full server compromise, including exfiltration of model weights, credentials, and training data. Immediately isolate runner server endpoints from untrusted networks and prioritize patching to a confirmed fixed version.

Is CVE-2024-9070 actively exploited?

No confirmed active exploitation of CVE-2024-9070 has been reported, but organizations should still patch proactively.

How to fix CVE-2024-9070?

1. IMMEDIATE: Firewall or network-policy restrict all BentoML runner server ports to trusted internal services only—do not expose to public internet or untrusted segments. 2. PATCH: Upgrade BentoML to the latest available release; verify fix inclusion by reviewing runner_app.py changes in the v1.4.5+ GitHub history. Confirm patched: N/A in current data—check https://github.com/bentoml/BentoML/releases for official guidance. 3. DETECT: Review HTTP access logs on runner servers for requests with args-number > 1 originating from unexpected sources; alert on anomalous child process spawning from runner processes. 4. AUDIT: Enumerate all BentoML deployments in your environment including Kubernetes pods, Docker Compose stacks, and cloud VMs; check network exposure for each. 5. COMPENSATING CONTROL: If patching is blocked, deploy a reverse proxy or API gateway in front of runner endpoints with strict request validation rejecting args-number > 1.

What systems are affected by CVE-2024-9070?

This vulnerability affects the following AI/ML architecture patterns: model serving, ML inference pipelines, MLOps platforms, containerized AI workloads.

What is the CVSS score for CVE-2024-9070?

CVE-2024-9070 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.85%.

What is the AI security impact?

Affected AI Architectures

model servingML inference pipelinesMLOps platformscontainerized AI workloads

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0049 Exploit Public-Facing Application
AML.T0072 Reverse Shell

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.6.1.2, A.9.3
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

A deserialization vulnerability exists in BentoML's runner server in bentoml/bentoml versions <=1.3.4.post1. By setting specific parameters, an attacker can execute unauthorized arbitrary code on the server, causing severe harm. The vulnerability is triggered when the args-number parameter is greater than 1, leading to automatic deserialization and arbitrary code execution.

Exploitation Scenario

An adversary scans internal ML infrastructure or a misconfigured cloud environment and identifies an exposed BentoML runner server HTTP endpoint. They craft a POST request to the runner API, setting args-number to 2 and embedding a malicious Python pickle payload in the request body. The server automatically deserializes the payload without any validation, executing the adversary's code—such as dropping a reverse shell, exfiltrating model weights and API keys from the serving environment, or establishing persistence via a cron job or systemd unit. No credentials, authentication tokens, AI/ML domain knowledge, or special tooling required. The full attack chain executes in seconds and is indistinguishable from legitimate runner traffic in standard HTTP logs.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
March 20, 2025
Last Modified
October 15, 2025
First Seen
March 20, 2025

Related Vulnerabilities