AI Component

Inference

Inference servers are the most actively-exploited component of the AI stack because they sit between the model and the public internet and they hold the GPU. The shape of the bugs is mostly web-app classes magnified by the cost of compute: missing auth on /v1 endpoints, SSRF that escapes the sandbox onto the platform's control plane, unsafe deserialization on model-loading paths, and path traversal in artifact-management endpoints. vLLM, Triton, TGI, BentoML, Ray Serve, and Ollama have each shipped multiple high-severity CVEs since 2023; CVE-2024-11041 in vLLM was a notable example combining prompt injection with code execution. Multi-tenant deployments are particularly exposed because a single bug typically crosses tenant boundaries. Defenses: aggressive patching, mandatory auth, network segmentation between inference and control plane, and per-tenant resource quotas to bound abuse.

577
Total CVEs
29
Pages
Page 4 of 29
Current
Severity CVE CVSS
MEDIUM CVE-2021-29544 5.5
MEDIUM CVE-2021-29545 5.5
HIGH CVE-2021-29546 7.8
MEDIUM CVE-2021-29547 5.5
MEDIUM CVE-2021-29548 5.5
MEDIUM CVE-2021-29549 5.5
MEDIUM CVE-2021-29550 5.5
MEDIUM CVE-2021-29551 5.5
MEDIUM CVE-2021-29552 5.5
HIGH CVE-2021-29553 7.1
MEDIUM CVE-2021-29555 5.5
MEDIUM CVE-2021-29556 5.5
MEDIUM CVE-2021-29557 5.5
HIGH CVE-2021-29558 7.8
HIGH CVE-2021-29560 7.1
MEDIUM CVE-2021-29561 5.5
MEDIUM CVE-2021-29563 5.5
MEDIUM CVE-2021-29565 5.5
MEDIUM CVE-2021-29567 5.5
HIGH CVE-2021-29568 7.8

Page 4 of 29