CVE-2021-29545: TensorFlow: heap OOB write in sparse tensor DoS
MEDIUM PoC AVAILABLEA local attacker with minimal privileges can crash TensorFlow processes by submitting malformed sparse tensors, triggering an out-of-bounds heap write via the sparse-to-CSR matrix conversion kernel. The local-only attack vector limits broad exposure, but multi-tenant ML platforms and shared data science environments (Jupyter hubs, model serving endpoints accepting sparse input) carry real denial-of-service risk. Patch to TensorFlow 2.5.0 or apply the cherry-picked fixes for 2.1.x–2.4.x immediately.
What is the risk?
Medium risk in isolation (CVSS 5.5, AV:L/PR:L), but elevated in shared or multi-tenant ML infrastructure. Exploitability within local scope is trivial—crafting a malformed sparse tensor index requires no deep ML knowledge. Impact is confined to availability; no confidentiality or integrity risk. Organizations running TensorFlow in Jupyter environments, managed notebook services, or model serving APIs that accept sparse matrix inputs should treat this as higher operational priority than the base score suggests.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| TensorFlow | pip | — | No patch |
Do you use TensorFlow? You're affected.
How severe is it?
What is the attack surface?
What should I do?
1 step-
1) Upgrade TensorFlow to 2.5.0, or apply cherry-pick commit 1e922ccdf6bf46a3a52641f99fd47d54c1decd13 to 2.1.4, 2.2.3, 2.3.3, or 2.4.2. 2) As a workaround, validate sparse tensor indices server-side before passing to CSR conversion: reject any input where max(indices[:,0]) >= expected_num_rows. 3) Run TensorFlow serving workers in isolated containers with restart policies to limit DoS impact duration. 4) Alert on unexpected TensorFlow process exits in serving infrastructure as a detection signal. 5) Audit production code for use of tf.raw_ops.SparseToCsrSparseMatrix and related sparse conversion APIs exposed to external input.
How is it classified?
Which compliance frameworks are affected?
This CVE is relevant to:
Frequently Asked Questions
What is CVE-2021-29545?
A local attacker with minimal privileges can crash TensorFlow processes by submitting malformed sparse tensors, triggering an out-of-bounds heap write via the sparse-to-CSR matrix conversion kernel. The local-only attack vector limits broad exposure, but multi-tenant ML platforms and shared data science environments (Jupyter hubs, model serving endpoints accepting sparse input) carry real denial-of-service risk. Patch to TensorFlow 2.5.0 or apply the cherry-picked fixes for 2.1.x–2.4.x immediately.
Is CVE-2021-29545 actively exploited?
Proof-of-concept exploit code is publicly available for CVE-2021-29545, increasing the risk of exploitation.
How to fix CVE-2021-29545?
1) Upgrade TensorFlow to 2.5.0, or apply cherry-pick commit 1e922ccdf6bf46a3a52641f99fd47d54c1decd13 to 2.1.4, 2.2.3, 2.3.3, or 2.4.2. 2) As a workaround, validate sparse tensor indices server-side before passing to CSR conversion: reject any input where max(indices[:,0]) >= expected_num_rows. 3) Run TensorFlow serving workers in isolated containers with restart policies to limit DoS impact duration. 4) Alert on unexpected TensorFlow process exits in serving infrastructure as a detection signal. 5) Audit production code for use of tf.raw_ops.SparseToCsrSparseMatrix and related sparse conversion APIs exposed to external input.
What systems are affected by CVE-2021-29545?
This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, shared ML notebooks.
What is the CVSS score for CVE-2021-29545?
CVE-2021-29545 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.19%.
What is the AI security impact?
Affected AI Architectures
MITRE ATLAS Techniques
AML.T0010.001 AI Software AML.T0029 Denial of AI Service AML.T0049 Exploit Public-Facing Application Compliance Controls Affected
What are the technical details?
Original Advisory
TensorFlow is an end-to-end open source platform for machine learning. An attacker can trigger a denial of service via a `CHECK`-fail in converting sparse tensors to CSR Sparse matrices. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/800346f2c03a27e182dd4fba48295f65e7790739/tensorflow/core/kernels/sparse/kernels.cc#L66) does a double redirection to access an element of an array allocated on the heap. If the value at `indices(i, 0)` is such that `indices(i, 0) + 1` is outside the bounds of `csr_row_ptr`, this results in writing outside of bounds of heap allocated data. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.
Exploitation Scenario
An adversary with access to a shared ML platform—such as a tenant on a multi-user Jupyter hub or a client of a model serving endpoint that accepts sparse feature inputs—constructs a sparse tensor where indices(i, 0) + 1 exceeds the allocated bounds of the csr_row_ptr array. Submitting this tensor to any operation invoking the SparseToCsrSparseMatrix kernel triggers the out-of-bounds heap write, causing an immediate CHECK-fail and process crash. In a model serving context (e.g., TensorFlow Serving behind an API), the attacker can repeatedly submit these payloads to keep the inference worker down, achieving sustained denial of service against the ML endpoint with no authentication bypass required beyond API access.
Weaknesses (CWE)
CWE-131 — Incorrect Calculation of Buffer Size: The product does not correctly calculate the size to be used when allocating a buffer, which could lead to a buffer overflow.
- [Implementation] When allocating a buffer for the purpose of transforming, converting, or encoding an input, allocate enough memory to handle the largest possible encoding. For example, in a routine that converts "&" characters to "&" for HTML entity encoding, the output buffer needs to be at least 5 times as large as the input buffer.
- [Implementation] Understand the programming language's underlying representation and how it interacts with numeric calculation (CWE-681). Pay close attention to byte size discrepancies, precision, signed/unsigned distinctions, truncation, conversion and casting between types, "not-a-number" calculations, and how the language handles numbers that are too large or too small for its underlying representation. [REF-7] Also be careful to account for 32-bit, 64-bit, and other potential differences that may affect the numeric representation.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H References
- github.com/tensorflow/tensorflow/commit/1e922ccdf6bf46a3a52641f99fd47d54c1decd13 Patch 3rd Party
- github.com/tensorflow/tensorflow/security/advisories/GHSA-hmg3-c7xj-6qwm Exploit Patch 3rd Party
Timeline
Related Vulnerabilities
CVE-2020-15196 9.9 TensorFlow: heap OOB read in sparse/ragged count ops
Same package: tensorflow CVE-2020-15205 9.8 TensorFlow: heap overflow in StringNGrams, ASLR bypass
Same package: tensorflow CVE-2020-15208 9.8 TFLite: OOB read/write via tensor dimension mismatch
Same package: tensorflow CVE-2019-16778 9.8 TensorFlow: heap overflow in UnsortedSegmentSum op
Same package: tensorflow CVE-2022-23587 9.8 TensorFlow: integer overflow in Grappler enables RCE
Same package: tensorflow