CVE-2020-28975: scikit-learn: DoS via crafted SVM model deserialization

HIGH PoC AVAILABLE CISA: TRACK*
Published November 21, 2020
CISO Take

If your ML pipelines load scikit-learn SVM models from untrusted sources (user uploads, shared registries, third-party model repos), an attacker can crash your inference service with a malicious pickle or JSON model file. Patch scikit-learn to a version post-1.0 and enforce strict model provenance controls — only load models from signed, internal registries. No exploit in the wild, but the attack primitive is trivially reproducible.

Risk Assessment

Real-world risk is context-dependent. CVSS 7.5 assumes network reachability to model loading code, which is accurate for model-as-a-service deployments or pipelines accepting external model uploads. Organizations with air-gapped model registries and no external model ingestion have minimal exposure. The vendor's disputed note (requires API misuse) understates risk in multi-tenant or collaborative ML platforms where users can supply model artifacts. No CISA KEV listing, no known active exploitation, but the attack is trivially reproducible from published PoC code.

Affected Systems

Package Ecosystem Vulnerable Range Patched
scikit-learn pip No patch
66.0K OpenSSF 9.4 27.9K dependents Pushed 7d ago 0% patched Full package profile →

Do you use scikit-learn? You're affected.

Severity & Risk

CVSS 3.1
7.5 / 10
EPSS
0.3%
chance of exploitation in 30 days
Higher than 48% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
CISA SSVC: Public PoC
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Network
AC Low
PR None
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. PATCH

    Upgrade scikit-learn to ≥1.0; the upstream fix validates _n_support bounds before prediction.

  2. MODEL PROVENANCE

    Enforce cryptographic signing of all model artifacts (e.g., sigstore, custom HMAC). Reject unsigned or externally sourced models.

  3. ISOLATION

    Run model loading in sandboxed subprocesses or containers with resource limits (ulimit, cgroups) so a segfault does not cascade to the host service.

  4. AVOID PICKLE FROM UNTRUSTED SOURCES: Replace pickle with safer serialization formats (ONNX, joblib with integrity checks) for models crossing trust boundaries.

  5. DETECT

    Alert on unexpected process crashes in inference workers; repeated crashes from model load events indicate active exploitation attempts.

CISA SSVC Assessment

Decision Track*
Exploitation poc
Automatable No
Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, robustness and cybersecurity for high-risk AI systems
ISO 42001
8.4 - AI System Risk Management — Data and Model Integrity 9.1 - Monitoring and Measurement of AI System Performance
NIST AI RMF
GOVERN-6.1 - Policies for third-party AI risks and dependencies MANAGE-2.4 - Residual risks are managed and documented
OWASP LLM Top 10
LLM05:2025 - Insecure Output Handling / Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2020-28975?

If your ML pipelines load scikit-learn SVM models from untrusted sources (user uploads, shared registries, third-party model repos), an attacker can crash your inference service with a malicious pickle or JSON model file. Patch scikit-learn to a version post-1.0 and enforce strict model provenance controls — only load models from signed, internal registries. No exploit in the wild, but the attack primitive is trivially reproducible.

Is CVE-2020-28975 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2020-28975, increasing the risk of exploitation.

How to fix CVE-2020-28975?

1. PATCH: Upgrade scikit-learn to ≥1.0; the upstream fix validates _n_support bounds before prediction. 2. MODEL PROVENANCE: Enforce cryptographic signing of all model artifacts (e.g., sigstore, custom HMAC). Reject unsigned or externally sourced models. 3. ISOLATION: Run model loading in sandboxed subprocesses or containers with resource limits (ulimit, cgroups) so a segfault does not cascade to the host service. 4. AVOID PICKLE FROM UNTRUSTED SOURCES: Replace pickle with safer serialization formats (ONNX, joblib with integrity checks) for models crossing trust boundaries. 5. DETECT: Alert on unexpected process crashes in inference workers; repeated crashes from model load events indicate active exploitation attempts.

What systems are affected by CVE-2020-28975?

This vulnerability affects the following AI/ML architecture patterns: model serving, training pipelines, ML model registries.

What is the CVSS score for CVE-2020-28975?

CVE-2020-28975 has a CVSS v3.1 base score of 7.5 (HIGH). The EPSS exploitation probability is 0.25%.

Technical Details

NVD Description

svm_predict_values in svm.cpp in Libsvm v324, as used in scikit-learn 0.23.2 and other products, allows attackers to cause a denial of service (segmentation fault) via a crafted model SVM (introduced via pickle, json, or any other model permanence standard) with a large value in the _n_support array. NOTE: the scikit-learn vendor's position is that the behavior can only occur if the library's API is violated by an application that changes a private attribute.

Exploitation Scenario

An adversary targeting an ML platform that accepts user-submitted scikit-learn models crafts a malicious SVM model by loading a legitimate model via pickle, then programmatically setting `model._n_support` to an array with an extremely large integer value, and re-serializing. They submit this model to the platform's model evaluation endpoint. When the backend calls `model.predict()`, libsvm's svm_predict_values dereferences memory beyond allocated bounds in the support vector arrays, producing a segfault that kills the inference worker process. On a shared inference platform, this denies service to all tenants. Repeating submissions prevents recovery and constitutes a sustained DoS against the ML serving infrastructure.

CVSS Vector

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
November 21, 2020
Last Modified
November 21, 2024
First Seen
November 21, 2020

Related Vulnerabilities