AI Component

Inference

Inference servers are the most actively-exploited component of the AI stack because they sit between the model and the public internet and they hold the GPU. The shape of the bugs is mostly web-app classes magnified by the cost of compute: missing auth on /v1 endpoints, SSRF that escapes the sandbox onto the platform's control plane, unsafe deserialization on model-loading paths, and path traversal in artifact-management endpoints. vLLM, Triton, TGI, BentoML, Ray Serve, and Ollama have each shipped multiple high-severity CVEs since 2023; CVE-2024-11041 in vLLM was a notable example combining prompt injection with code execution. Multi-tenant deployments are particularly exposed because a single bug typically crosses tenant boundaries. Defenses: aggressive patching, mandatory auth, network segmentation between inference and control plane, and per-tenant resource quotas to bound abuse.

577
Total CVEs
29
Pages
Page 2 of 29
Current
Severity CVE CVSS
MEDIUM CVE-2020-15200 5.9
MEDIUM CVE-2020-15201 4.8
CRITICAL CVE-2020-15202 9.0
HIGH CVE-2020-15203 7.5
MEDIUM CVE-2020-15204 5.3
CRITICAL CVE-2020-15205 9.8
HIGH CVE-2020-15206 7.5
CRITICAL CVE-2020-15207 9.0
CRITICAL CVE-2020-15208 9.8
MEDIUM CVE-2020-15209 5.9
MEDIUM CVE-2020-15210 6.5
MEDIUM CVE-2020-15211 4.8
HIGH CVE-2020-15212 8.6
MEDIUM CVE-2020-15213 4.0
HIGH CVE-2020-15214 8.1
HIGH CVE-2020-15265 7.5
HIGH CVE-2020-15266 7.5
MEDIUM CVE-2020-26266 5.3
HIGH CVE-2020-26267 7.8
LOW CVE-2020-26270 3.3

Page 2 of 29