CVE-2021-29591: TFLite: crafted model causes infinite loop / stack overflow

HIGH PoC AVAILABLE
Published May 14, 2021
CISO Take

Any pipeline that loads untrusted TFLite models is exposed to a denial-of-service or potential stack-smash via a maliciously crafted .tflite file with self-referencing While subgraphs. Patch to TF 2.5.0 / 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 immediately. Until patched, gate model ingestion with a validator that detects cyclic subgraph references before evaluation.

Risk Assessment

CVSS 7.8 (High) with local attack vector and low complexity — exploitability is straightforward once the attacker can get a malicious model loaded. The main exposure window is model-serving infrastructure, MLOps pipelines that pull models from registries, and mobile/edge devices running TFLite inference. Not in CISA KEV and no public exploit code confirmed, keeping operational urgency moderate. The C:H/I:H/A:H CVSS subscores indicate that if the stack overflow is exploitable beyond pure DoS, privilege escalation or memory corruption on the inference host is theoretically in scope.

Affected Systems

Package Ecosystem Vulnerable Range Patched
tensorflow pip No patch
195.0K OpenSSF 7.2 3.7K dependents Pushed today 4% patched ~1372d to patch Full package profile →

Do you use tensorflow? You're affected.

Severity & Risk

CVSS 3.1
7.8 / 10
EPSS
0.0%
chance of exploitation in 30 days
Higher than 6% of all CVEs
Exploitation Status
Exploit Available
Exploitation: MEDIUM
Sophistication
Trivial
Exploitation Confidence
medium
Public PoC indexed (trickest/cve)
Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV AC PR UI S C I A
AV Local
AC Low
PR Low
UI None
S Unchanged
C High
I High
A High

Recommended Action

5 steps
  1. Upgrade TensorFlow to 2.5.0, or cherry-pick patches to 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 as appropriate.

  2. Implement pre-evaluation graph validation: scan TFLite flatbuffer subgraph references for cycles before calling Interpreter::Invoke().

  3. Run TFLite inference in a sandboxed process with resource limits (ulimit stack size, process timeout) to contain blast radius.

  4. Enforce model provenance: only load models from signed, trusted sources; reject unsigned or externally sourced .tflite files.

  5. Detection: monitor inference worker processes for runaway CPU or stack exhaustion signals (SIGSEGV/SIGABRT from stack overflow) as anomaly indicators.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Art.15 - Accuracy, Robustness and Cybersecurity Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system
ISO 42001
8.4 - AI System Risk Management A.6.1.4 - AI system testing A.9.3 - AI risk treatment
NIST AI RMF
MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems MANAGE-2.2 - Mechanisms for Detecting and Responding to AI Incidents MAP 5.1 - Likelihood and impact of vulnerabilities and potential impacts
OWASP LLM Top 10
LLM04 - Model Denial of Service LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29591?

Any pipeline that loads untrusted TFLite models is exposed to a denial-of-service or potential stack-smash via a maliciously crafted .tflite file with self-referencing While subgraphs. Patch to TF 2.5.0 / 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 immediately. Until patched, gate model ingestion with a validator that detects cyclic subgraph references before evaluation.

Is CVE-2021-29591 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29591, increasing the risk of exploitation.

How to fix CVE-2021-29591?

1. Upgrade TensorFlow to 2.5.0, or cherry-pick patches to 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 as appropriate. 2. Implement pre-evaluation graph validation: scan TFLite flatbuffer subgraph references for cycles before calling Interpreter::Invoke(). 3. Run TFLite inference in a sandboxed process with resource limits (ulimit stack size, process timeout) to contain blast radius. 4. Enforce model provenance: only load models from signed, trusted sources; reject unsigned or externally sourced .tflite files. 5. Detection: monitor inference worker processes for runaway CPU or stack exhaustion signals (SIGSEGV/SIGABRT from stack overflow) as anomaly indicators.

What systems are affected by CVE-2021-29591?

This vulnerability affects the following AI/ML architecture patterns: model serving, training pipelines, edge/mobile inference.

What is the CVSS score for CVE-2021-29591?

CVE-2021-29591 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.02%.

Technical Details

NVD Description

TensorFlow is an end-to-end open source platform for machine learning. TFlite graphs must not have loops between nodes. However, this condition was not checked and an attacker could craft models that would result in infinite loop during evaluation. In certain cases, the infinite loop would be replaced by stack overflow due to too many recursive calls. For example, the `While` implementation(https://github.com/tensorflow/tensorflow/blob/106d8f4fb89335a2c52d7c895b7a7485465ca8d9/tensorflow/lite/kernels/while.cc) could be tricked into a scneario where both the body and the loop subgraphs are the same. Evaluating one of the subgraphs means calling the `Eval` function for the other and this quickly exhaust all stack space. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range. Please consult our security guide(https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) for more information regarding the security model and how to contact us with issues and questions.

Exploitation Scenario

An adversary crafts a .tflite model where the While op's body_subgraph_index and cond_subgraph_index both point to the same subgraph, triggering mutual recursion in the TFLite evaluator. The attacker uploads this model to a shared model registry (e.g., an internal MLflow or Hugging Face private hub). A CI/CD pipeline pulls the model for acceptance testing, loads it via the TFLite Interpreter, and the evaluation thread either spins indefinitely (DoS) or exhausts the call stack and crashes the inference worker — taking down the serving tier or blocking the deployment pipeline. In a mobile context, an attacker distributing a malicious app update embedding the crafted .tflite could crash the on-device inference runtime.

CVSS Vector

CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H

Timeline

Published
May 14, 2021
Last Modified
November 21, 2024
First Seen
May 14, 2021

Related Vulnerabilities