CVE-2021-37658: TensorFlow null ptr deref

CISO Take

Upgrade TensorFlow to 2.6.0 or apply the cherrypick patch to 2.5.1/2.4.3/2.3.4 immediately. Any local user or process with TF access can crash training jobs or inference servers by passing an empty tensor as the k parameter to MatrixSetDiagV* operations. In shared ML infrastructure, this is a practical denial-of-service vector that wastes compute and disrupts availability.

What is the risk?

CVSS 7.8 (High) with local attack vector and low privilege requirement. While not directly remotely exploitable, multi-tenant training clusters and inference environments with user-submitted computation graphs materially elevate risk. CWE-824 (uninitialized pointer access) creates undefined behavior that, in theory, could be escalated beyond DoS toward memory corruption with sufficient expertise, though no public exploit demonstrates this.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 4d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.2%

chance of exploitation in 30 days

Higher than 6% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

No known exploitation

Sophistication

Trivial

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

1 step

1) Upgrade TensorFlow to 2.6.0 or later — this is the definitive fix. 2) If upgrade is blocked, apply cherrypick commit ff8894044dfae5568ecbf2ed514c1a37dc394f1b to TF 2.5.1, 2.4.3, or 2.3.4. 3) Audit any code accepting user-controlled inputs passed to MatrixSetDiagV* ops and add tensor shape validation (assert non-empty k). 4) In shared ML platforms, enforce input validation at the job submission boundary before graphs are executed. 5) Monitor for unexpected TensorFlow process crashes (SIGABRT/SIGSEGV) in training and serving infrastructure as a detection signal.

How is it classified?

DoS Code Execution Framework Inference AML.T0010.001 - AI Software AML.T0029 - Denial of AI Service AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Article 15 - Accuracy, robustness and cybersecurity

ISO 42001

8.1 - Operational planning and control

NIST AI RMF

MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems

OWASP LLM Top 10

LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-37658?

Upgrade TensorFlow to 2.6.0 or apply the cherrypick patch to 2.5.1/2.4.3/2.3.4 immediately. Any local user or process with TF access can crash training jobs or inference servers by passing an empty tensor as the k parameter to MatrixSetDiagV* operations. In shared ML infrastructure, this is a practical denial-of-service vector that wastes compute and disrupts availability.

Is CVE-2021-37658 actively exploited?

No confirmed active exploitation of CVE-2021-37658 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37658?

1) Upgrade TensorFlow to 2.6.0 or later — this is the definitive fix. 2) If upgrade is blocked, apply cherrypick commit ff8894044dfae5568ecbf2ed514c1a37dc394f1b to TF 2.5.1, 2.4.3, or 2.3.4. 3) Audit any code accepting user-controlled inputs passed to MatrixSetDiagV* ops and add tensor shape validation (assert non-empty k). 4) In shared ML platforms, enforce input validation at the job submission boundary before graphs are executed. 5) Monitor for unexpected TensorFlow process crashes (SIGABRT/SIGSEGV) in training and serving infrastructure as a detection signal.

What systems are affected by CVE-2021-37658?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, ml infrastructure.

What is the CVSS score for CVE-2021-37658?

CVE-2021-37658 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.17%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingml infrastructure

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0029 Denial of AI Service

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15

ISO 42001: 8.1

NIST AI RMF: MANAGE 2.2

OWASP LLM Top 10: LLM05

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions an attacker can cause undefined behavior via binding a reference to null pointer in all operations of type `tf.raw_ops.MatrixSetDiagV*`. The [implementation](https://github.com/tensorflow/tensorflow/blob/84d053187cb80d975ef2b9684d4b61981bca0c41/tensorflow/core/kernels/linalg/matrix_diag_op.cc) has incomplete validation that the value of `k` is a valid tensor. We have check that this value is either a scalar or a vector, but there is no check for the number of elements. If this is an empty tensor, then code that accesses the first element of the tensor is wrong. We have patched the issue in GitHub commit ff8894044dfae5568ecbf2ed514c1a37dc394f1b. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training cluster or self-hosted TensorFlow inference endpoint submits a computation graph where the k parameter of a MatrixSetDiagV2 operation is an empty tensor (shape [0]). When TF executes this op, the code accesses element [0] of an empty buffer, triggering undefined behavior — typically a process crash (SIGABRT or segfault). In a multi-tenant training environment this disrupts co-tenant jobs and wastes GPU-hours. Against a model serving endpoint that accepts custom op graphs or user-defined layers, this takes down the inference server until manually restarted, producing a repeatable DoS.