CVE-2021-41220: TensorFlow: use-after-free in async collective ops
HIGH PoC AVAILABLETensorFlow's distributed training operations contain a use-after-free and memory leak in CollectiveReduceV2, exploitable locally with low privileges (CVSS 7.8). Any org running multi-GPU or multi-node TensorFlow training workloads on TF 2.6.0 should patch immediately to 2.6.1 or 2.7.0. Training infrastructure is high-value — a compromised training node enables model poisoning, data exfiltration, or lateral movement within ML pipelines.
Risk Assessment
High severity (7.8) but local attack vector limits exposure to environments where untrusted users share TensorFlow compute resources, such as multi-tenant GPU clusters, JupyterHub environments, or shared ML training infrastructure. UAF vulnerabilities can be reliably turned into arbitrary code execution by skilled attackers; the low complexity and no-user-interaction requirements amplify risk once local access exists. Organizations with shared ML compute (academic clusters, cloud ML notebooks with multi-tenancy) face the highest exposure.
Affected Systems
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| tensorflow | pip | — | No patch |
Do you use tensorflow? You're affected.
Severity & Risk
Attack Surface
Recommended Action
5 steps-
Patch: upgrade to TensorFlow 2.7.0 or 2.6.1 (backport available). Verify with
pip show tensorflow | grep Version. -
Isolate training workloads: enforce one-job-per-node policies on shared compute; avoid multi-tenant GPU clusters until patched.
-
Detect: monitor for anomalous process crashes or memory faults in TF training jobs (SIGABRT, SIGSEGV from the TF runtime).
-
Audit exposure: identify all internal services running TF 2.6.0 in distributed mode — check CI/CD pipelines, MLOps platforms (Kubeflow, Vertex, SageMaker custom containers), and Jupyter environments.
-
Enforce image pinning in container-based ML pipelines to prevent accidental rollback to vulnerable versions.
Classification
Compliance Impact
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-41220?
TensorFlow's distributed training operations contain a use-after-free and memory leak in CollectiveReduceV2, exploitable locally with low privileges (CVSS 7.8). Any org running multi-GPU or multi-node TensorFlow training workloads on TF 2.6.0 should patch immediately to 2.6.1 or 2.7.0. Training infrastructure is high-value — a compromised training node enables model poisoning, data exfiltration, or lateral movement within ML pipelines.
Is CVE-2021-41220 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-41220, increasing the risk of exploitation.
How to fix CVE-2021-41220?
1. Patch: upgrade to TensorFlow 2.7.0 or 2.6.1 (backport available). Verify with `pip show tensorflow | grep Version`. 2. Isolate training workloads: enforce one-job-per-node policies on shared compute; avoid multi-tenant GPU clusters until patched. 3. Detect: monitor for anomalous process crashes or memory faults in TF training jobs (SIGABRT, SIGSEGV from the TF runtime). 4. Audit exposure: identify all internal services running TF 2.6.0 in distributed mode — check CI/CD pipelines, MLOps platforms (Kubeflow, Vertex, SageMaker custom containers), and Jupyter environments. 5. Enforce image pinning in container-based ML pipelines to prevent accidental rollback to vulnerable versions.
What systems are affected by CVE-2021-41220?
This vulnerability affects the following AI/ML architecture patterns: distributed training pipelines, multi-GPU training infrastructure, MLOps platforms (Kubeflow, Vertex AI, SageMaker custom containers), shared Jupyter/notebook environments, model training pipelines.
What is the CVSS score for CVE-2021-41220?
CVE-2021-41220 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.02%.
Technical Details
NVD Description
TensorFlow is an open source platform for machine learning. In affected versions the async implementation of `CollectiveReduceV2` suffers from a memory leak and a use after free. This occurs due to the asynchronous computation and the fact that objects that have been `std::move()`d from are still accessed. The fix will be included in TensorFlow 2.7.0. We will also cherrypick this commit on TensorFlow 2.6.1, as this version is the only one that is also affected.
Exploitation Scenario
An attacker with low-privileged access to a shared GPU training cluster (e.g., a compromised ML engineer account or a rogue training job submitted via an MLOps pipeline) launches a specially crafted distributed training job that triggers the async CollectiveReduceV2 code path. The std::move() misuse causes the runtime to access freed memory, which the attacker controls via heap shaping to redirect execution. With code execution on the training node, the attacker can inject malicious gradient updates to poison the model under training, exfiltrate proprietary training data in-flight, or install persistence on the ML infrastructure. In Kubernetes-based ML platforms (Kubeflow, Argo Workflows), this could mean escaping the training pod boundary depending on cluster configuration.
Weaknesses (CWE)
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H References
- github.com/tensorflow/tensorflow/commit/ca38dab9d3ee66c5de06f11af9a4b1200da5ef75 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-gpfh-jvf9-7wg5 Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow
AI Threat Alert