CVE-2021-37692: TensorFlow: string tensor GC segfault causes process DoS

GHSA-cmgw-8vpc-rc59 MEDIUM
Published August 12, 2021
CISO Take

TensorFlow's Go bindings crash via segfault when garbage collection fires on a string tensor whose encoding failed due to mismatched dimensions. Attack vector is local with low privileges—no remote exposure, no data loss. Patch to TensorFlow 2.5.1 or 2.6.0 immediately if running Go-based TF code; validate tensor dimensions before encoding in any custom Go TF operators.

What is the risk?

Low-to-medium operational risk. The AV:L/PR:L CVSS vector tightly constrains exposure—exploitation requires local code execution in the TF process context. EPSS of 0.00032 reflects no meaningful exploitation activity observed in the wild. Impact is confined to availability (process crash); confidentiality and integrity are not affected. Risk increases in shared or multi-tenant ML compute environments where Go-based TF code processes user-controlled tensor shapes.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →
TensorFlow pip >= 2.5.0rc0, < 2.5.1 2.5.1
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →
TensorFlow pip >= 2.5.0rc0, < 2.5.1 2.5.1
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →
TensorFlow pip >= 2.5.0rc0, < 2.5.1 2.5.1
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 7% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch: upgrade tensorflow, tensorflow-cpu, or tensorflow-gpu to 2.5.1 (cherry-pick backport) or 2.6.0+.

  2. Audit Go codepaths: identify any string tensor construction sites where dimensions are derived from external or user-controlled input.

  3. Add input validation: enforce dimension consistency checks before calling NewTensor in Go code.

  4. Add process supervision: systemd RestartAlways or Kubernetes restartPolicy=Always to auto-recover crashed TF Go processes.

  5. Detection: alert on unexpected exits of TF worker processes (exit code SIGSEGV / signal 11).

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 9 - Risk management system
ISO 42001
A.6.2.6 - AI system vulnerability management
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain the value of deployed AI systems
OWASP LLM Top 10
LLM04:2023 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-37692?

TensorFlow's Go bindings crash via segfault when garbage collection fires on a string tensor whose encoding failed due to mismatched dimensions. Attack vector is local with low privileges—no remote exposure, no data loss. Patch to TensorFlow 2.5.1 or 2.6.0 immediately if running Go-based TF code; validate tensor dimensions before encoding in any custom Go TF operators.

Is CVE-2021-37692 actively exploited?

No confirmed active exploitation of CVE-2021-37692 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37692?

1. Patch: upgrade tensorflow, tensorflow-cpu, or tensorflow-gpu to 2.5.1 (cherry-pick backport) or 2.6.0+. 2. Audit Go codepaths: identify any string tensor construction sites where dimensions are derived from external or user-controlled input. 3. Add input validation: enforce dimension consistency checks before calling NewTensor in Go code. 4. Add process supervision: systemd RestartAlways or Kubernetes restartPolicy=Always to auto-recover crashed TF Go processes. 5. Detection: alert on unexpected exits of TF worker processes (exit code SIGSEGV / signal 11).

What systems are affected by CVE-2021-37692?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing.

What is the CVSS score for CVE-2021-37692?

CVE-2021-37692 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.17%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingdata preprocessing

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service

Compliance Controls Affected

EU AI Act: Article 9
ISO 42001: A.6.2.6
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM04:2023

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions under certain conditions, Go code can trigger a segfault in string deallocation. For string tensors, `C.TF_TString_Dealloc` is called during garbage collection within a finalizer function. However, tensor structure isn't checked until encoding to avoid a performance penalty. The current method for dealloc assumes that encoding succeeded, but segfaults when a string tensor is garbage collected whose encoding failed (e.g., due to mismatched dimensions). To fix this, the call to set the finalizer function is deferred until `NewTensor` returns and, if encoding failed for a string tensor, deallocs are determined based on bytes written. We have patched the issue in GitHub commit 8721ba96e5760c229217b594f6d2ba332beedf22. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, which is the other affected version.

Exploitation Scenario

An attacker with local code execution in a shared ML training cluster submits a job that constructs a TensorFlow string tensor with intentionally mismatched dimensions via the Go binding. The tensor encoding fails at construction time, but no exception is raised. When Go's garbage collector runs and invokes the finalizer, C.TF_TString_Dealloc dereferences the malformed structure, triggering a segfault that crashes the entire TF process. In a multi-tenant GPU cluster, this disrupts co-located training jobs and burns expensive compute time, effectively functioning as a targeted denial-of-service against competing workloads.

Weaknesses (CWE)

CWE-20 — Improper Input Validation: The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

  • [Architecture and Design] Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]
  • [Architecture and Design] Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities