AI Component

Inference

Inference servers are the most actively-exploited component of the AI stack because they sit between the model and the public internet and they hold the GPU. The shape of the bugs is mostly web-app classes magnified by the cost of compute: missing auth on /v1 endpoints, SSRF that escapes the sandbox onto the platform's control plane, unsafe deserialization on model-loading paths, and path traversal in artifact-management endpoints. vLLM, Triton, TGI, BentoML, Ray Serve, and Ollama have each shipped multiple high-severity CVEs since 2023; CVE-2024-11041 in vLLM was a notable example combining prompt injection with code execution. Multi-tenant deployments are particularly exposed because a single bug typically crosses tenant boundaries. Defenses: aggressive patching, mandatory auth, network segmentation between inference and control plane, and per-tenant resource quotas to bound abuse.

577
Total CVEs
29
Pages
Page 3 of 29
Current
Severity CVE CVSS
MEDIUM CVE-2021-29554 5.5
HIGH CVE-2021-29513 7.8
HIGH CVE-2021-29514 7.8
HIGH CVE-2021-29515 7.8
MEDIUM CVE-2021-29517 5.5
HIGH CVE-2021-29518 7.8
HIGH CVE-2021-29525 7.8
MEDIUM CVE-2021-29526 5.5
MEDIUM CVE-2021-29527 5.5
MEDIUM CVE-2021-29528 5.5
HIGH CVE-2021-29529 7.8
HIGH CVE-2021-29532 7.1
MEDIUM CVE-2021-29533 5.5
MEDIUM CVE-2021-29534 5.5
HIGH CVE-2021-29535 7.8
HIGH CVE-2021-29536 7.8
HIGH CVE-2021-29537 7.8
MEDIUM CVE-2021-29539 5.5
MEDIUM CVE-2021-29542 5.5
MEDIUM CVE-2021-29543 5.5

Page 3 of 29