CVE-2020-13092: scikit-learn: RCE via malicious joblib model deserialization

CRITICAL PoC AVAILABLE
Published May 15, 2020
CISO Take

Any ML pipeline loading scikit-learn model files from untrusted sources (shared storage, S3 buckets, user uploads, third-party model registries) is exposed to full remote code execution — one malicious .pkl file is all it takes. Audit every call to joblib.load() in your stack and enforce cryptographic verification (hash or signature) of model artifacts before loading. This is not a bug scikit-learn will patch; it is a design constraint you must architect around.

What is the risk?

CVSS 9.8 reflects worst-case exposure: no authentication, no user interaction, network-reachable if a model-serving endpoint loads user-supplied or externally fetched model files. In practice, exploitability depends on whether attacker-controlled files reach joblib.load() — which is common in MLOps pipelines that pull models from shared registries or accept model uploads. The 'by design' disclaimer does not reduce operational risk; ML teams routinely treat joblib.load() as a safe I/O operation without understanding the underlying pickle execution model.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
scikit-learn pip No patch
66.4K OpenSSF 9.4 29.2K dependents Pushed 3d ago 0% patched Full package profile →

Do you use scikit-learn? You're affected.

How severe is it?

CVSS 3.1
9.8 / 10
EPSS
2.6%
chance of exploitation in 30 days
Higher than 84% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C High
I High
A High

What should I do?

6 steps
  1. Upgrade scikit-learn to >= 0.24.0 (no functional fix exists — mitigation is architectural).

  2. Never call joblib.load() on files from untrusted or unverified sources.

  3. Enforce cryptographic integrity checks (SHA-256 hash or digital signature) on all model artifacts before loading.

  4. Run model loading processes in sandboxed environments (gVisor, seccomp profiles, read-only mounts) to limit blast radius.

  5. Implement model provenance tracking in your MLOps pipeline — only load models whose chain of custody is auditable.

  6. For detection: monitor for unexpected os.system(), subprocess, or network calls spawned from Python processes running inference workloads.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.6.2.6 - AI system lifecycle — data and model integrity A.9.4 - AI supply chain — third-party AI components
NIST AI RMF
GOVERN-1.7 - Processes and procedures are in place for decommissioning and phase-out MS-2.5 - Measure — AI risk monitoring and evaluation
OWASP LLM Top 10
LLM03:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2020-13092?

Any ML pipeline loading scikit-learn model files from untrusted sources (shared storage, S3 buckets, user uploads, third-party model registries) is exposed to full remote code execution — one malicious .pkl file is all it takes. Audit every call to joblib.load() in your stack and enforce cryptographic verification (hash or signature) of model artifacts before loading. This is not a bug scikit-learn will patch; it is a design constraint you must architect around.

Is CVE-2020-13092 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2020-13092, increasing the risk of exploitation.

How to fix CVE-2020-13092?

1. Upgrade scikit-learn to >= 0.24.0 (no functional fix exists — mitigation is architectural). 2. Never call joblib.load() on files from untrusted or unverified sources. 3. Enforce cryptographic integrity checks (SHA-256 hash or digital signature) on all model artifacts before loading. 4. Run model loading processes in sandboxed environments (gVisor, seccomp profiles, read-only mounts) to limit blast radius. 5. Implement model provenance tracking in your MLOps pipeline — only load models whose chain of custody is auditable. 6. For detection: monitor for unexpected os.system(), subprocess, or network calls spawned from Python processes running inference workloads.

What systems are affected by CVE-2020-13092?

This vulnerability affects the following AI/ML architecture patterns: model serving, training pipelines, MLOps/CI-CD pipelines, model registries, data science notebooks.

What is the CVSS score for CVE-2020-13092?

CVE-2020-13092 has a CVSS v3.1 base score of 9.8 (CRITICAL). The EPSS exploitation probability is 2.65%.

What is the AI security impact?

Affected AI Architectures

model servingtraining pipelinesMLOps/CI-CD pipelinesmodel registriesdata science notebooks

MITRE ATLAS Techniques

AML.T0010.003 Model
AML.T0011.000 Unsafe AI Artifacts
AML.T0018.002 Embed Malware
AML.T0049 Exploit Public-Facing Application
AML.T0058 Publish Poisoned Models

Compliance Controls Affected

EU AI Act: Article 15, Article 9
ISO 42001: A.6.2.6, A.9.4
NIST AI RMF: GOVERN-1.7, MS-2.5
OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

scikit-learn (aka sklearn) through 0.23.0 can unserialize and execute commands from an untrusted file that is passed to the joblib.load() function, if __reduce__ makes an os.system call. NOTE: third parties dispute this issue because the joblib.load() function is documented as unsafe and it is the user's responsibility to use the function in a secure manner

Exploitation Scenario

An adversary identifies an ML inference API that accepts a model file path or fetches models from a configurable S3 bucket. They craft a malicious scikit-learn model file using Python's pickle __reduce__ mechanism to embed an os.system() reverse shell call. The file is uploaded to a writable storage location or injected into a model registry via a compromised CI credential. When the inference service calls joblib.load() on the next model refresh cycle, the payload executes with the service account's privileges — giving the attacker a foothold inside the ML infrastructure to pivot to training data, API keys, or downstream systems.

Weaknesses (CWE)

CWE-502 — Deserialization of Untrusted Data: The product deserializes untrusted data without sufficiently ensuring that the resulting data will be valid.

  • [Architecture and Design, Implementation] If available, use the signing/sealing features of the programming language to assure that deserialized data has not been tainted. For example, a hash-based message authentication code (HMAC) could be used to ensure that data has not been modified.
  • [Implementation] When deserializing data, populate a new object rather than just deserializing. The result is that the data flows through safe input validation and that the functions are safe.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 15, 2020
Last Modified
November 21, 2024
First Seen
May 15, 2020

Related Vulnerabilities