CVE-2020-28975: scikit-learn: DoS via crafted SVM model deserialization

HIGH PoC AVAILABLE CISA: TRACK*
Published November 21, 2020
CISO Take

If your ML pipelines load scikit-learn SVM models from untrusted sources (user uploads, shared registries, third-party model repos), an attacker can crash your inference service with a malicious pickle or JSON model file. Patch scikit-learn to a version post-1.0 and enforce strict model provenance controls — only load models from signed, internal registries. No exploit in the wild, but the attack primitive is trivially reproducible.

What is the risk?

Real-world risk is context-dependent. CVSS 7.5 assumes network reachability to model loading code, which is accurate for model-as-a-service deployments or pipelines accepting external model uploads. Organizations with air-gapped model registries and no external model ingestion have minimal exposure. The vendor's disputed note (requires API misuse) understates risk in multi-tenant or collaborative ML platforms where users can supply model artifacts. No CISA KEV listing, no known active exploitation, but the attack is trivially reproducible from published PoC code.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
scikit-learn pip No patch
66.4K OpenSSF 9.4 29.2K dependents Pushed 3d ago 0% patched Full package profile →

Do you use scikit-learn? You're affected.

How severe is it?

CVSS 3.1
7.5 / 10
EPSS
3.4%
chance of exploitation in 30 days
Higher than 87% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. PATCH

    Upgrade scikit-learn to ≥1.0; the upstream fix validates _n_support bounds before prediction.

  2. MODEL PROVENANCE

    Enforce cryptographic signing of all model artifacts (e.g., sigstore, custom HMAC). Reject unsigned or externally sourced models.

  3. ISOLATION

    Run model loading in sandboxed subprocesses or containers with resource limits (ulimit, cgroups) so a segfault does not cascade to the host service.

  4. AVOID PICKLE FROM UNTRUSTED SOURCES: Replace pickle with safer serialization formats (ONNX, joblib with integrity checks) for models crossing trust boundaries.

  5. DETECT

    Alert on unexpected process crashes in inference workers; repeated crashes from model load events indicate active exploitation attempts.

What does CISA's SSVC say?

Decision Track*
Exploitation poc
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity for high-risk AI systems
ISO 42001
8.4 - AI System Risk Management — Data and Model Integrity 9.1 - Monitoring and Measurement of AI System Performance
NIST AI RMF
GOVERN-6.1 - Policies for third-party AI risks and dependencies MANAGE-2.4 - Residual risks are managed and documented
OWASP LLM Top 10
LLM05:2025 - Insecure Output Handling / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2020-28975?

If your ML pipelines load scikit-learn SVM models from untrusted sources (user uploads, shared registries, third-party model repos), an attacker can crash your inference service with a malicious pickle or JSON model file. Patch scikit-learn to a version post-1.0 and enforce strict model provenance controls — only load models from signed, internal registries. No exploit in the wild, but the attack primitive is trivially reproducible.

Is CVE-2020-28975 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2020-28975, increasing the risk of exploitation.

How to fix CVE-2020-28975?

1. PATCH: Upgrade scikit-learn to ≥1.0; the upstream fix validates _n_support bounds before prediction. 2. MODEL PROVENANCE: Enforce cryptographic signing of all model artifacts (e.g., sigstore, custom HMAC). Reject unsigned or externally sourced models. 3. ISOLATION: Run model loading in sandboxed subprocesses or containers with resource limits (ulimit, cgroups) so a segfault does not cascade to the host service. 4. AVOID PICKLE FROM UNTRUSTED SOURCES: Replace pickle with safer serialization formats (ONNX, joblib with integrity checks) for models crossing trust boundaries. 5. DETECT: Alert on unexpected process crashes in inference workers; repeated crashes from model load events indicate active exploitation attempts.

What systems are affected by CVE-2020-28975?

This vulnerability affects the following AI/ML architecture patterns: model serving, training pipelines, ML model registries.

What is the CVSS score for CVE-2020-28975?

CVE-2020-28975 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 3.43%.

What is the AI security impact?

Affected AI Architectures

model servingtraining pipelinesML model registries

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011.000 Unsafe AI Artifacts
AML.T0018 Manipulate AI Model
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.15
ISO 42001: 8.4, 9.1
NIST AI RMF: GOVERN-6.1, MANAGE-2.4
OWASP LLM Top 10: LLM05:2025

What are the technical details?

Original Advisory

svm_predict_values in svm.cpp in Libsvm v324, as used in scikit-learn 0.23.2 and other products, allows attackers to cause a denial of service (segmentation fault) via a crafted model SVM (introduced via pickle, json, or any other model permanence standard) with a large value in the _n_support array. NOTE: the scikit-learn vendor's position is that the behavior can only occur if the library's API is violated by an application that changes a private attribute.

Exploitation Scenario

An adversary targeting an ML platform that accepts user-submitted scikit-learn models crafts a malicious SVM model by loading a legitimate model via pickle, then programmatically setting `model._n_support` to an array with an extremely large integer value, and re-serializing. They submit this model to the platform's model evaluation endpoint. When the backend calls `model.predict()`, libsvm's svm_predict_values dereferences memory beyond allocated bounds in the support vector arrays, producing a segfault that kills the inference worker process. On a shared inference platform, this denies service to all tenants. Repeating submissions prevents recovery and constitutes a sustained DoS against the ML serving infrastructure.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
November 21, 2020
Last Modified
November 21, 2024
First Seen
November 21, 2020

Related Vulnerabilities