CVE-2021-37643: TensorFlow: null deref in MatrixDiagPartOp, DoS risk

HIGH
Published August 12, 2021
CISO Take

If your ML infrastructure runs TensorFlow below 2.6.0 (or un-patched 2.3.x–2.5.x), a low-privileged local user can crash training jobs or silently corrupt matrix diagonal computations by passing an invalid padding value. Patch to the fixed versions immediately; the silent data-corruption path—where invalid outputs are produced without crashing—is more dangerous than the DoS in automated training pipelines. Shared ML compute clusters (multi-tenant GPU nodes, CI/CD-driven training) are the highest-priority remediation targets.

Risk Assessment

Moderate-high risk in shared compute environments. CVSS 7.1 (Local/Low complexity/Low privileges) understates the real risk in organizations running multi-tenant ML infrastructure. Exploiting this requires only local shell access—common on shared training clusters, compromised CI/CD runners, or containerized notebook environments. The silent corruption variant (invalid results with no crash) is operationally more dangerous than the null-pointer crash because it can produce subtly incorrect model weights that propagate undetected through downstream pipelines.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed today 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.1 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 2% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Trivial

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I High
A High

Recommended Action

5 steps
  1. Patch

    Upgrade to TensorFlow 2.6.0+, or apply backported fixes: 2.5.1, 2.4.3, or 2.3.4 (commit 482da92). Versions outside supported range (< 2.3.x) receive no patch and should be treated as EOL.

  2. Input validation

    Add upstream validation to reject empty tensors or invalid padding values before they reach MatrixDiagPartOp.

  3. Multi-tenant hardening

    Audit who can submit arbitrary TF jobs on shared clusters; apply job-submission RBAC if not already in place.

  4. Detection

    Enable process crash monitoring on training nodes (OOM/segfault signals); anomalous job terminations may indicate exploitation.

  5. Pipeline integrity

    For affected pipeline runs between publication (2021-08-12) and patch deployment, consider revalidating model outputs produced during that window if inputs came from untrusted sources.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
A.6.1.3 - AI system risk management — technical robustness
NIST AI RMF
MANAGE-2.2 - Treatments for identified AI risks MAP-5.1 - Likelihood and magnitude of AI risks — impact assessment

Frequently Asked Questions

What is CVE-2021-37643?

If your ML infrastructure runs TensorFlow below 2.6.0 (or un-patched 2.3.x–2.5.x), a low-privileged local user can crash training jobs or silently corrupt matrix diagonal computations by passing an invalid padding value. Patch to the fixed versions immediately; the silent data-corruption path—where invalid outputs are produced without crashing—is more dangerous than the DoS in automated training pipelines. Shared ML compute clusters (multi-tenant GPU nodes, CI/CD-driven training) are the highest-priority remediation targets.

Is CVE-2021-37643 actively exploited?

No confirmed active exploitation of CVE-2021-37643 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37643?

1. **Patch**: Upgrade to TensorFlow 2.6.0+, or apply backported fixes: 2.5.1, 2.4.3, or 2.3.4 (commit 482da92). Versions outside supported range (< 2.3.x) receive no patch and should be treated as EOL. 2. **Input validation**: Add upstream validation to reject empty tensors or invalid padding values before they reach MatrixDiagPartOp. 3. **Multi-tenant hardening**: Audit who can submit arbitrary TF jobs on shared clusters; apply job-submission RBAC if not already in place. 4. **Detection**: Enable process crash monitoring on training nodes (OOM/segfault signals); anomalous job terminations may indicate exploitation. 5. **Pipeline integrity**: For affected pipeline runs between publication (2021-08-12) and patch deployment, consider revalidating model outputs produced during that window if inputs came from untrusted sources.

What systems are affected by CVE-2021-37643?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, data preprocessing, MLOps/CI-CD pipelines.

What is the CVSS score for CVE-2021-37643?

CVE-2021-37643 has a CVSS v3.1 base score of 7.1 (HIGH). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. If a user does not provide a valid padding value to `tf.raw_ops.MatrixDiagPartOp`, then the code triggers a null pointer dereference (if input is empty) or produces invalid behavior, ignoring all values after the first. The [implementation](https://github.com/tensorflow/tensorflow/blob/8d72537c6abf5a44103b57b9c2e22c14f5f49698/tensorflow/core/kernels/linalg/matrix_diag_op.cc#L89) reads the first value from a tensor buffer without first checking that the tensor has values to read from. We have patched the issue in GitHub commit 482da92095c4d48f8784b1f00dda4f81c28d2988. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with a data-scientist account on a shared GPU training cluster (compromised credentials, insider, or rogue contractor) submits a TensorFlow training script that passes an empty tensor to `tf.raw_ops.MatrixDiagPartOp` with an invalid padding value. In the crash path, the null-pointer dereference kills the TF process mid-training, causing denial of service to co-located jobs sharing the same node. In the more insidious corruption path, the attacker crafts a tensor where MatrixDiagPartOp silently discards valid diagonal entries, subtly degrading model accuracy. The corrupted weights pass automated test thresholds (accuracy within noise), get promoted through the MLOps pipeline, and are deployed to production—eroding model integrity without triggering alerts.

Weaknesses (CWE)

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:H/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities