CVE-2020-13092: scikit-learn: RCE via malicious joblib model deserialization

CRITICAL PoC AVAILABLE
Published May 15, 2020
CISO Take

Any ML pipeline loading scikit-learn model files from untrusted sources (shared storage, S3 buckets, user uploads, third-party model registries) is exposed to full remote code execution — one malicious .pkl file is all it takes. Audit every call to joblib.load() in your stack and enforce cryptographic verification (hash or signature) of model artifacts before loading. This is not a bug scikit-learn will patch; it is a design constraint you must architect around.

Risk Assessment

CVSS 9.8 reflects worst-case exposure: no authentication, no user interaction, network-reachable if a model-serving endpoint loads user-supplied or externally fetched model files. In practice, exploitability depends on whether attacker-controlled files reach joblib.load() — which is common in MLOps pipelines that pull models from shared registries or accept model uploads. The 'by design' disclaimer does not reduce operational risk; ML teams routinely treat joblib.load() as a safe I/O operation without understanding the underlying pickle execution model.

Affected Systems

Package Ecosystem Vulnerable Range Patched
scikit-learn pip No patch
66.0K OpenSSF 9.4 27.9K dependents Pushed 7d ago 0% patched Full package profile →

Do you use scikit-learn? You're affected.

Severity & Risk

CVSS 3.1
9.8 / 10
EPSS
0.9%
chance of exploitation in 30 days
Higher than 76% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

Recommended Action

6 steps
  1. Upgrade scikit-learn to >= 0.24.0 (no functional fix exists — mitigation is architectural).

  2. Never call joblib.load() on files from untrusted or unverified sources.

  3. Enforce cryptographic integrity checks (SHA-256 hash or digital signature) on all model artifacts before loading.

  4. Run model loading processes in sandboxed environments (gVisor, seccomp profiles, read-only mounts) to limit blast radius.

  5. Implement model provenance tracking in your MLOps pipeline — only load models whose chain of custody is auditable.

  6. For detection: monitor for unexpected os.system(), subprocess, or network calls spawned from Python processes running inference workloads.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.6.2.6 - AI system lifecycle — data and model integrity A.9.4 - AI supply chain — third-party AI components
NIST AI RMF
GOVERN-1.7 - Processes and procedures are in place for decommissioning and phase-out MS-2.5 - Measure — AI risk monitoring and evaluation
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2020-13092?

Any ML pipeline loading scikit-learn model files from untrusted sources (shared storage, S3 buckets, user uploads, third-party model registries) is exposed to full remote code execution — one malicious .pkl file is all it takes. Audit every call to joblib.load() in your stack and enforce cryptographic verification (hash or signature) of model artifacts before loading. This is not a bug scikit-learn will patch; it is a design constraint you must architect around.

Is CVE-2020-13092 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2020-13092, increasing the risk of exploitation.

How to fix CVE-2020-13092?

1. Upgrade scikit-learn to >= 0.24.0 (no functional fix exists — mitigation is architectural). 2. Never call joblib.load() on files from untrusted or unverified sources. 3. Enforce cryptographic integrity checks (SHA-256 hash or digital signature) on all model artifacts before loading. 4. Run model loading processes in sandboxed environments (gVisor, seccomp profiles, read-only mounts) to limit blast radius. 5. Implement model provenance tracking in your MLOps pipeline — only load models whose chain of custody is auditable. 6. For detection: monitor for unexpected os.system(), subprocess, or network calls spawned from Python processes running inference workloads.

What systems are affected by CVE-2020-13092?

This vulnerability affects the following AI/ML architecture patterns: model serving, training pipelines, MLOps/CI-CD pipelines, model registries, data science notebooks.

What is the CVSS score for CVE-2020-13092?

CVE-2020-13092 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 0.88%.

Technical Details

NVD Description

scikit-learn (aka sklearn) through 0.23.0 can unserialize and execute commands from an untrusted file that is passed to the joblib.load() function, if __reduce__ makes an os.system call. NOTE: third parties dispute this issue because the joblib.load() function is documented as unsafe and it is the user's responsibility to use the function in a secure manner

Exploitation Scenario

An adversary identifies an ML inference API that accepts a model file path or fetches models from a configurable S3 bucket. They craft a malicious scikit-learn model file using Python's pickle __reduce__ mechanism to embed an os.system() reverse shell call. The file is uploaded to a writable storage location or injected into a model registry via a compromised CI credential. When the inference service calls joblib.load() on the next model refresh cycle, the payload executes with the service account's privileges — giving the attacker a foothold inside the ML infrastructure to pivot to training data, API keys, or downstream systems.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 15, 2020
Last Modified
November 21, 2024
First Seen
May 15, 2020

Related Vulnerabilities