CVE-2021-37665: TensorFlow MKL: null-ptr/heap-OOB in requantization ops

HIGH
Published August 12, 2021
CISO Take

TensorFlow's MKL backend fails to validate tensor dimensions in quantized operations, allowing a low-privilege local attacker to trigger heap out-of-bounds access or null pointer dereference—potentially achieving code execution. Patch to TensorFlow 2.6.0 or apply the cherrypick fixes for 2.3.x–2.5.x immediately if running quantized inference or training workloads. Exposure is highest on shared ML training clusters and Intel-backed inference services processing externally-sourced models.

What is the risk?

CVSS 7.8 (High) with local attack vector and low complexity. The local vector reduces internet-exposure risk, but shared ML infrastructure, multi-tenant training platforms, and CI/CD pipelines that auto-execute externally-sourced models face meaningful exposure. C:H/I:H/A:H scope indicates full compromise potential once triggered—no user interaction required. Not in CISA KEV and patched in 2021, so risk is residual for unpatched legacy environments.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
7.8 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 8% of all CVEs
Exploitation Status
No known exploitation
Sophistication
Moderate

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

What should I do?

6 steps
  1. Upgrade to TensorFlow 2.6.0 which contains the full fix; or apply cherrypick patches to 2.5.1, 2.4.3, or 2.3.4.

  2. If immediate patching is blocked, disable MKL backend for quantized operations when processing untrusted inputs.

  3. Enforce strict access controls on ML training infrastructure—limit who can submit jobs or supply model artifacts.

  4. In containerized deployments, apply pod security policies and seccomp profiles to limit local privilege escalation blast radius.

  5. Monitor inference services for unexpected crashes or anomalous process behavior as an exploitation signal.

  6. Audit your ML dependency inventory for TF versions prior to 2.6.0.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, Robustness and Cybersecurity
ISO 42001
A.10.2 - AI System Security Testing
NIST AI RMF
MANAGE 2.2 - Mechanisms to Sustain AI RMF Trustworthiness
OWASP LLM Top 10
LLM03:2025 - Supply Chain

Frequently Asked Questions

What is CVE-2021-37665?

TensorFlow's MKL backend fails to validate tensor dimensions in quantized operations, allowing a low-privilege local attacker to trigger heap out-of-bounds access or null pointer dereference—potentially achieving code execution. Patch to TensorFlow 2.6.0 or apply the cherrypick fixes for 2.3.x–2.5.x immediately if running quantized inference or training workloads. Exposure is highest on shared ML training clusters and Intel-backed inference services processing externally-sourced models.

Is CVE-2021-37665 actively exploited?

No confirmed active exploitation of CVE-2021-37665 has been reported, but organizations should still patch proactively.

How to fix CVE-2021-37665?

1. Upgrade to TensorFlow 2.6.0 which contains the full fix; or apply cherrypick patches to 2.5.1, 2.4.3, or 2.3.4. 2. If immediate patching is blocked, disable MKL backend for quantized operations when processing untrusted inputs. 3. Enforce strict access controls on ML training infrastructure—limit who can submit jobs or supply model artifacts. 4. In containerized deployments, apply pod security policies and seccomp profiles to limit local privilege escalation blast radius. 5. Monitor inference services for unexpected crashes or anomalous process behavior as an exploitation signal. 6. Audit your ML dependency inventory for TF versions prior to 2.6.0.

What systems are affected by CVE-2021-37665?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, inference optimization, model deployment.

What is the CVSS score for CVE-2021-37665?

CVE-2021-37665 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.18%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servinginference optimizationmodel deployment

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0011.000 Unsafe AI Artifacts
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.10.2
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM03:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. In affected versions due to incomplete validation in MKL implementation of requantization, an attacker can trigger undefined behavior via binding a reference to a null pointer or can access data outside the bounds of heap allocated arrays. The [implementation](https://github.com/tensorflow/tensorflow/blob/460e000de3a83278fb00b61a16d161b1964f15f4/tensorflow/core/kernels/mkl/mkl_requantization_range_per_channel_op.cc) does not validate the dimensions of the `input` tensor. A similar issue occurs in `MklRequantizePerChannelOp`. The [implementation](https://github.com/tensorflow/tensorflow/blob/460e000de3a83278fb00b61a16d161b1964f15f4/tensorflow/core/kernels/mkl/mkl_requantize_per_channel_op.cc) does not perform full validation for all the input arguments. We have patched the issue in GitHub commit 9e62869465573cb2d9b5053f1fa02a81fce21d69 and in the Github commit 203214568f5bc237603dbab6e1fd389f1572f5c9. The fix will be included in TensorFlow 2.6.0. We will also cherrypick this commit on TensorFlow 2.5.1, TensorFlow 2.4.3, and TensorFlow 2.3.4, as these are also affected and still in supported range.

Exploitation Scenario

An attacker with low-privilege access to a shared ML training cluster (e.g., via compromised developer credentials or a malicious co-tenant) crafts a TensorFlow SavedModel containing quantized operations with deliberately malformed tensor dimensions. When the model is loaded and executed on the MKL-optimized backend—standard on Intel Xeon infrastructure—the missing dimension validation triggers a heap out-of-bounds write or null pointer dereference. On a multi-tenant ML platform, this can escalate to cross-tenant data corruption or code execution under the service account. In a DevOps pipeline that auto-validates externally-sourced models, the exploit runs with pipeline service account privileges, enabling lateral movement into broader infrastructure.

Weaknesses (CWE)

CWE-20 — Improper Input Validation: The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

  • [Architecture and Design] Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]
  • [Architecture and Design] Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
August 12, 2021
Last Modified
November 21, 2024
First Seen
August 12, 2021

Related Vulnerabilities