CVE-2021-29575: TensorFlow: stack overflow DoS in ReverseSequence op

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

Patch TensorFlow to 2.5.0 (or backport versions 2.4.2/2.3.3/2.2.3/2.1.4) immediately if running sequence-based models. Risk is elevated in shared ML platforms — multi-tenant Jupyter environments or shared GPU clusters — where any user can trigger a TF runtime crash. Not actively exploited, but trivially reproducible with a single negative integer argument.

What is the risk?

Medium overall, but context-dependent. The local attack vector limits exposure for dedicated, isolated inference servers. Risk escalates significantly in shared ML environments (data science platforms, Jupyter hubs, Kubeflow pipelines) where untrusted or semi-trusted users execute TF operations. CVSS 5.5 is appropriate for isolated deployments; organizations running multi-tenant AI infrastructure should treat this closer to high due to blast radius on co-located workloads.

What systems are affected?

Package Ecosystem Vulnerable Range Patched
TensorFlow pip No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1
5.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
Higher than 10% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

What should I do?

5 steps
  1. Patch: Upgrade TensorFlow to 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 per your branch.

  2. Workaround (if patching is delayed): Add explicit input validation — assert seq_dim >= 0 and batch_dim >= 0 and both within tensor rank bounds before calling ReverseSequence.

  3. Detection: Monitor for abnormal TF process crashes or CHECK-failure stack traces in inference/training logs.

  4. Access control: In shared environments, restrict direct access to tf.raw_ops namespace for untrusted users.

  5. Dependency scanning: Add CVE-2021-29575 to your SCA tooling allowlist to flag unpatched TF versions in container images.

How is it classified?

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.9.3 - AI system security and resilience
NIST AI RMF
MANAGE 2.2 - AI risk treatment and response mechanisms
OWASP LLM Top 10
LLM04 - Model Denial of Service

Frequently Asked Questions

What is CVE-2021-29575?

Patch TensorFlow to 2.5.0 (or backport versions 2.4.2/2.3.3/2.2.3/2.1.4) immediately if running sequence-based models. Risk is elevated in shared ML platforms — multi-tenant Jupyter environments or shared GPU clusters — where any user can trigger a TF runtime crash. Not actively exploited, but trivially reproducible with a single negative integer argument.

Is CVE-2021-29575 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29575, increasing the risk of exploitation.

How to fix CVE-2021-29575?

1. Patch: Upgrade TensorFlow to 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 per your branch. 2. Workaround (if patching is delayed): Add explicit input validation — assert seq_dim >= 0 and batch_dim >= 0 and both within tensor rank bounds before calling ReverseSequence. 3. Detection: Monitor for abnormal TF process crashes or CHECK-failure stack traces in inference/training logs. 4. Access control: In shared environments, restrict direct access to tf.raw_ops namespace for untrusted users. 5. Dependency scanning: Add CVE-2021-29575 to your SCA tooling allowlist to flag unpatched TF versions in container images.

What systems are affected by CVE-2021-29575?

This vulnerability affects the following AI/ML architecture patterns: training pipelines, model serving, multi-tenant ML platforms.

What is the CVSS score for CVE-2021-29575?

CVE-2021-29575 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.20%.

What is the AI security impact?

Affected AI Architectures

training pipelinesmodel servingmulti-tenant ML platforms

MITRE ATLAS Techniques

AML.T0010.001 AI Software
AML.T0029 Denial of AI Service
AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Article 15
ISO 42001: A.9.3
NIST AI RMF: MANAGE 2.2
OWASP LLM Top 10: LLM04

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. The implementation of `tf.raw_ops.ReverseSequence` allows for stack overflow and/or `CHECK`-fail based denial of service. The implementation(https://github.com/tensorflow/tensorflow/blob/5b3b071975e01f0d250c928b2a8f901cd53b90a7/tensorflow/core/kernels/reverse_sequence_op.cc#L114-L118) fails to validate that `seq_dim` and `batch_dim` arguments are valid. Negative values for `seq_dim` can result in stack overflow or `CHECK`-failure, depending on the version of Eigen code used to implement the operation. Similar behavior can be exhibited by invalid values of `batch_dim`. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with local access to a shared ML platform — e.g., a data scientist on a multi-tenant Jupyter environment — executes a single notebook cell calling tf.raw_ops.ReverseSequence with seq_dim=-1 on an arbitrary tensor. This triggers a stack overflow in the Eigen backend, crashing the TF runtime process. In a Kubernetes-based ML serving environment, this causes pod restarts and temporary inference service disruption for all users sharing the node. A malicious insider could use this to disrupt competitor team training runs or mask other malicious activity during the outage window.

Weaknesses (CWE)

CWE-787 — Out-of-bounds Write: The product writes data past the end, or before the beginning, of the intended buffer.

  • [Requirements] Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, many languages that perform their own memory management, such as Java and Perl, are not subject to buffer overflows. Other languages, such as Ada and C#, typically provide overflow protection, but the protection can be disabled by the programmer. Be wary that a language's interface to native code may still be subject to overflows, even if the language itself is theoretically safe.
  • [Architecture and Design] Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. Examples include the Safe C String Library (SafeStr) by Messier and Viega [REF-57], and the Strsafe.h library from Microsoft [REF-56]. These libraries provide safer versions of overflow-prone string-handling functions.

Source: MITRE CWE corpus.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities