CVE-2021-29591: TFLite: crafted model causes infinite loop / stack overflow

HIGH PoC AVAILABLE

Published May 14, 2021

CISO Take

Any pipeline that loads untrusted TFLite models is exposed to a denial-of-service or potential stack-smash via a maliciously crafted .tflite file with self-referencing While subgraphs. Patch to TF 2.5.0 / 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 immediately. Until patched, gate model ingestion with a validator that detects cyclic subgraph references before evaluation.

What is the risk?

CVSS 7.8 (High) with local attack vector and low complexity — exploitability is straightforward once the attacker can get a malicious model loaded. The main exposure window is model-serving infrastructure, MLOps pipelines that pull models from registries, and mobile/edge devices running TFLite inference. Not in CISA KEV and no public exploit code confirmed, keeping operational urgency moderate. The C:H/I:H/A:H CVSS subscores indicate that if the stack overflow is exploitable beyond pure DoS, privilege escalation or memory corruption on the inference host is theoretically in scope.

What systems are affected?

Package	Ecosystem	Vulnerable Range	Patched
TensorFlow	pip	—	No patch
195.8K OpenSSF 7.1 3.7K dependents Pushed 3d ago 4% patched ~1372d to patch Full package profile →

Do you use TensorFlow? You're affected.

How severe is it?

CVSS 3.1

7.8 / 10

EPSS

0.3%

chance of exploitation in 30 days

Higher than 17% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Trivial

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, VulnCheck KEV, CISA SSVC, EPSS, Metasploit, Exploit-DB, trickest/cve, Nuclei templates, and inthewild.io exploitation reports.

What is the attack surface?

AV Local

AC Low

PR Low

UI None

S Unchanged

C High

I High

A High

What should I do?

5 steps

Upgrade TensorFlow to 2.5.0, or cherry-pick patches to 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 as appropriate.
Implement pre-evaluation graph validation: scan TFLite flatbuffer subgraph references for cycles before calling Interpreter::Invoke().
Run TFLite inference in a sandboxed process with resource limits (ulimit stack size, process timeout) to contain blast radius.
Enforce model provenance: only load models from signed, trusted sources; reject unsigned or externally sourced .tflite files.
Detection: monitor inference worker processes for runaway CPU or stack exhaustion signals (SIGSEGV/SIGABRT from stack overflow) as anomaly indicators.

How is it classified?

DoS Supply Chain Framework Inference AML.T0010.001 - AI Software AML.T0011.000 - Unsafe AI Artifacts AML.T0029 - Denial of AI Service AML.T0049 - Exploit Public-Facing Application

Which compliance frameworks are affected?

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, Robustness and Cybersecurity Article 15 - Accuracy, robustness and cybersecurity Article 9 - Risk management system

ISO 42001

8.4 - AI System Risk Management A.6.1.4 - AI system testing A.9.3 - AI risk treatment

NIST AI RMF

MANAGE 2.2 - Mechanisms are in place and applied to sustain the value of deployed AI systems MANAGE-2.2 - Mechanisms for Detecting and Responding to AI Incidents MAP 5.1 - Likelihood and impact of vulnerabilities and potential impacts

OWASP LLM Top 10

LLM04 - Model Denial of Service LLM05:2025 - Supply Chain Vulnerabilities

Frequently Asked Questions

What is CVE-2021-29591?

Is CVE-2021-29591 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2021-29591, increasing the risk of exploitation.

How to fix CVE-2021-29591?

1. Upgrade TensorFlow to 2.5.0, or cherry-pick patches to 2.4.2 / 2.3.3 / 2.2.3 / 2.1.4 as appropriate. 2. Implement pre-evaluation graph validation: scan TFLite flatbuffer subgraph references for cycles before calling Interpreter::Invoke(). 3. Run TFLite inference in a sandboxed process with resource limits (ulimit stack size, process timeout) to contain blast radius. 4. Enforce model provenance: only load models from signed, trusted sources; reject unsigned or externally sourced .tflite files. 5. Detection: monitor inference worker processes for runaway CPU or stack exhaustion signals (SIGSEGV/SIGABRT from stack overflow) as anomaly indicators.

What systems are affected by CVE-2021-29591?

This vulnerability affects the following AI/ML architecture patterns: model serving, training pipelines, edge/mobile inference.

What is the CVSS score for CVE-2021-29591?

CVE-2021-29591 has a CVSS v3.1 base score of 7.8 (HIGH). The EPSS exploitation probability is 0.26%.

What is the AI security impact?

Affected AI Architectures

model servingtraining pipelinesedge/mobile inference

MITRE ATLAS Techniques

AML.T0010.001 AI Software

AML.T0011.000 Unsafe AI Artifacts

AML.T0029 Denial of AI Service

AML.T0049 Exploit Public-Facing Application

Compliance Controls Affected

EU AI Act: Art.15, Article 15, Article 9

ISO 42001: 8.4, A.6.1.4, A.9.3

NIST AI RMF: MANAGE 2.2, MANAGE-2.2, MAP 5.1

OWASP LLM Top 10: LLM04, LLM05:2025

What are the technical details?

Original Advisory

TensorFlow is an end-to-end open source platform for machine learning. TFlite graphs must not have loops between nodes. However, this condition was not checked and an attacker could craft models that would result in infinite loop during evaluation. In certain cases, the infinite loop would be replaced by stack overflow due to too many recursive calls. For example, the `While` implementation(https://github.com/tensorflow/tensorflow/blob/106d8f4fb89335a2c52d7c895b7a7485465ca8d9/tensorflow/lite/kernels/while.cc) could be tricked into a scneario where both the body and the loop subgraphs are the same. Evaluating one of the subgraphs means calling the `Eval` function for the other and this quickly exhaust all stack space. The fix will be included in TensorFlow 2.5.0. We will also cherrypick this commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow 2.1.4, as these are also affected and still in supported range. Please consult our security guide(https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) for more information regarding the security model and how to contact us with issues and questions.

Exploitation Scenario

An adversary crafts a .tflite model where the While op's body_subgraph_index and cond_subgraph_index both point to the same subgraph, triggering mutual recursion in the TFLite evaluator. The attacker uploads this model to a shared model registry (e.g., an internal MLflow or Hugging Face private hub). A CI/CD pipeline pulls the model for acceptance testing, loads it via the TFLite Interpreter, and the evaluation thread either spins indefinitely (DoS) or exhausts the call stack and crashes the inference worker — taking down the serving tier or blocking the deployment pipeline. In a mobile context, an attacker distributing a malicious app update embedding the crafted .tflite could crash the on-device inference runtime.

Weaknesses (CWE)

CWE-674 Uncontrolled Recursion Primary CWE-835 Loop with Unreachable Exit Condition ('Infinite Loop') Primary CWE-835 Loop with Unreachable Exit Condition ('Infinite Loop')

CWE-674 — Uncontrolled Recursion: The product does not properly control the amount of recursion that takes place, consuming excessive resources, such as allocated memory or the program stack.

[Implementation] Ensure that an end condition will be reached under all logic conditions. The end condition may include checking against the depth of recursion and exiting with an error if the recursion goes too deep. The complexity of the end condition contributes to the effectiveness of this action.
[Implementation] Increase the stack size.

Source: MITRE CWE corpus.