CVE-2021-29542: TensorFlow: StringNGrams heap overflow crashes ML process

MEDIUM PoC AVAILABLE
Published May 14, 2021
CISO Take

A heap buffer overflow in TensorFlow's StringNGrams op can crash any process executing crafted text preprocessing graphs—the impact is limited to availability (no data exfiltration). Patch to TF 2.5.0 / 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4; risk is elevated in shared Jupyter or multi-tenant training environments where untrusted graphs can be submitted. Not a breaking-alert priority unless your org runs unpatched TF on exposed NLP inference endpoints.

Risk Assessment

Medium operational risk. CVSS AV:L constrains exploitability to actors with local execution access—direct remote exploitation is not possible. However, in shared ML platforms (JupyterHub, SageMaker multi-tenant, KubeFlow), 'local' effectively means any authenticated user. The CWE-787 (Out-of-Bounds Write) classification leaves open the theoretical possibility of code execution beyond pure DoS, even though CVSS scores only A:H. No public exploitation evidence and no CISA KEV listing keep this firmly in the patch-and-monitor category.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed 6d ago 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
5.5 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 1% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Moderate
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C None
I None
A High

Recommended Action

5 steps
  1. PATCH

    Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 (cherry-picked fix).

  2. WORKAROUND

    If patching is not immediately feasible, sanitize StringNGrams inputs server-side to reject configurations where num_tokens would evaluate to 0 (e.g., enforce minimum token counts before invoking the op).

  3. ISOLATE

    Run TF inference workers as unprivileged processes in containers; limit blast radius of a crash to a single replica.

  4. DETECT

    Monitor for unexpected process crashes or OOM kills in ML serving pods—heap corruption often manifests as SIGABRT/SIGSEGV; alert on abnormal crash rates in TF Serving deployments.

  5. AUDIT

    Inventory all TF versions across training and serving infrastructure using pip show tensorflow or equivalent in container images.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.8.3 - AI system security
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place to sustain AI risk management
OWASP LLM Top 10
LLM05 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29542?

A heap buffer overflow in TensorFlow's StringNGrams op can crash any process executing crafted text preprocessing graphs—the impact is limited to availability (no data exfiltration). Patch to TF 2.5.0 / 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4; risk is elevated in shared Jupyter or multi-tenant training environments where untrusted graphs can be submitted. Not a breaking-alert priority unless your org runs unpatched TF on exposed NLP inference endpoints.

Is CVE-2021-29542 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29542, increasing the risk of exploitation.

How to fix CVE-2021-29542?

1. PATCH: Upgrade to TF 2.5.0, 2.4.2, 2.3.3, 2.2.3, or 2.1.4 (cherry-picked fix). 2. WORKAROUND: If patching is not immediately feasible, sanitize StringNGrams inputs server-side to reject configurations where num_tokens would evaluate to 0 (e.g., enforce minimum token counts before invoking the op). 3. ISOLATE: Run TF inference workers as unprivileged processes in containers; limit blast radius of a crash to a single replica. 4. DETECT: Monitor for unexpected process crashes or OOM kills in ML serving pods—heap corruption often manifests as SIGABRT/SIGSEGV; alert on abnormal crash rates in TF Serving deployments. 5. AUDIT: Inventory all TF versions across training and serving infrastructure using `pip show tensorflow` or equivalent in container images.

What systems are affected by CVE-2021-29542?

This vulnerability affects the following AI/ML architecture patterns: NLP training pipelines, text preprocessing pipelines, model serving (TF Serving with text models), shared ML platforms / multi-tenant notebooks.

What is the CVSS score for CVE-2021-29542?

CVE-2021-29542 has a CVSS v3.1 base score of 5.5 (MEDIUM). The EPSS exploitation probability is 0.01%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. An attacker can cause a heap buffer overflow by passing crafted inputs to `tf.raw_ops.StringNGrams`. This is because the implementation(https://github.com/tensorflow/tensorflow/blob/1cdd4da14282210cc759e468d9781741ac7d01bf/tensorflow/core/kernels/string_ngrams_op.cc#L171-L185) fails to consider corner cases where input would be split in such a way that the generated tokens should only contain padding elements. If input is such that `num_tokens` is 0, then, for `data_start_index=0` (when left padding is present), the marked line would result in reading `data[-1]`. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range.

Exploitation Scenario

An adversary with access to a shared ML training environment (e.g., a multi-tenant JupyterHub or a KubeFlow pipeline that accepts user-submitted TF graphs) crafts a TensorFlow computation graph that calls tf.raw_ops.StringNGrams with inputs engineered to produce num_tokens=0 when left padding is applied. When the graph executes, the kernel reads data[-1]—an out-of-bounds memory access—triggering a heap buffer overflow that crashes the Python or TF Serving process. In a shared environment, this disrupts other tenants' training jobs. On a production inference endpoint, it causes repeated service crashes, resulting in degraded availability or a sustained DoS against the NLP inference tier.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities