AI Component

Inference

Inference servers are the most actively-exploited component of the AI stack because they sit between the model and the public internet and they hold the GPU. The shape of the bugs is mostly web-app classes magnified by the cost of compute: missing auth on /v1 endpoints, SSRF that escapes the sandbox onto the platform's control plane, unsafe deserialization on model-loading paths, and path traversal in artifact-management endpoints. vLLM, Triton, TGI, BentoML, Ray Serve, and Ollama have each shipped multiple high-severity CVEs since 2023; CVE-2024-11041 in vLLM was a notable example combining prompt injection with code execution. Multi-tenant deployments are particularly exposed because a single bug typically crosses tenant boundaries. Defenses: aggressive patching, mandatory auth, network segmentation between inference and control plane, and per-tenant resource quotas to bound abuse.

577
Total CVEs
29
Pages
Page 1 of 29
Current
Severity CVE CVSS
UNKNOWN CVE-2026-25083 -
CRITICAL CVE-2026-25960 9.8
CRITICAL CVE-2026-30824 9.8
UNKNOWN CVE-2018-7576 -
HIGH CVE-2018-8825 8.8
UNKNOWN CVE-2018-7577 -
UNKNOWN CVE-2019-9635 -
CRITICAL CVE-2019-16778 9.8
HIGH CVE-2020-5215 7.5
MEDIUM CVE-2018-21233 6.5
MEDIUM CVE-2020-15190 5.3
MEDIUM CVE-2020-15191 5.3
MEDIUM CVE-2020-15192 4.3
HIGH CVE-2020-15193 7.1
MEDIUM CVE-2020-15194 5.3
HIGH CVE-2020-15195 8.8
CRITICAL CVE-2020-15196 9.9
MEDIUM CVE-2020-15197 6.3
MEDIUM CVE-2020-15198 5.4
MEDIUM CVE-2020-15199 5.9

Page 1 of 29