CVE-2025-6242 — HIGH (CVSS 7.1) AI Security Vulnerability

CISO Take

Any vLLM deployment running multimodal workloads between v0.5.0 and v0.11.0 is vulnerable to server-side request forgery via crafted media URLs — an authenticated user can pivot to internal APIs, cloud metadata endpoints, or other services co-located with the inference server. Patch to v0.11.0 immediately; if patching is not possible, enforce strict egress filtering at the network layer and disable multimodal endpoints. This is a particularly dangerous vector in cloud-hosted inference clusters where IAM credential theft via metadata service (169.254.169.254) is trivial once SSRF is achieved.

Risk Assessment

CVSS 7.1 High with high attack complexity and low privilege requirement. The high complexity rating reflects the need to understand internal network topology, but any authenticated API user in a multi-tenant or shared inference deployment can attempt exploitation. EPSS 0.00048 suggests no widespread active exploitation at time of publication. However, vLLM is one of the most widely deployed open-source LLM inference engines, and SSRF in inference infrastructure is disproportionately dangerous: these servers typically run with elevated IAM roles, have broad internal network reach, and are exposed to untrusted user input by design. Cloud-native deployments (EKS, GKE, AKS) face highest risk due to reachable metadata APIs.

Affected Systems

Package	Ecosystem	Vulnerable Range	Patched
vllm	pip	>= 0.5.0, < 0.11.0	`0.11.0`
78.9K 126 dependents Pushed 6d ago 56% patched ~32d to patch Full package profile →

Do you use vllm? You're affected.

Severity & Risk

CVSS 3.1

7.1 / 10

EPSS

0.1%

chance of exploitation in 30 days

Higher than 17% of all CVEs

Source: EPSS v3 — FIRST.org

Exploitation Status

Exploit Available

Exploitation: MEDIUM

Sophistication

Moderate

Exploitation Confidence

medium

○ Public PoC indexed (trickest/cve)

Composite signal derived from CISA KEV, CISA SSVC, EPSS, trickest/cve, and Nuclei templates.

Attack Surface

AV Network

AC High

PR Low

UI None

S Unchanged

C High

I Low

A High

Recommended Action

6 steps

PATCH

Upgrade vLLM to >= 0.11.0 immediately — this is the only full remediation.
WORKAROUND (if patching is delayed): Disable multimodal API endpoints at the reverse proxy/gateway layer; reject requests containing media URL parameters.
NETWORK CONTROLS

Implement egress filtering on inference server instances — block outbound access to RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and link-local (169.254.0.0/16) from vLLM processes. Use cloud security groups or network policies to enforce this.
CLOUD

Enable IMDSv2 on AWS (requires session-oriented tokens, not fetchable via simple SSRF); equivalent hardening on GCP/Azure.
DETECTION

Alert on outbound HTTP requests from inference servers to metadata endpoints or internal subnets; log all media URL parameters in API requests for anomaly review.
AUDIT

Review current vLLM version across all environments including dev/staging where cloud credentials may be present.

CISA SSVC Assessment

Decision Track

Exploitation none

Automatable No

Technical Impact partial

Source: CISA Vulnrichment (SSVC v2.0). Decision based on the CISA Coordinator decision tree.

Classification

Data Extraction Auth Bypass Inference Framework AML.T0025 - Exfiltration via Cyber Means AML.T0040 - AI Model Inference API Access AML.T0049 - Exploit Public-Facing Application AML.T0075 - Cloud Service Discovery

Compliance Impact

This CVE is relevant to:

EU AI Act

Art.15 - Accuracy, robustness and cybersecurity

ISO 42001

A.6.2.6 - AI system security

NIST AI RMF

MANAGE-2.2 - Mechanisms are in place to address residual risks

OWASP LLM Top 10

LLM06:2025 - Excessive Agency

Frequently Asked Questions

What is CVE-2025-6242?

Any vLLM deployment running multimodal workloads between v0.5.0 and v0.11.0 is vulnerable to server-side request forgery via crafted media URLs — an authenticated user can pivot to internal APIs, cloud metadata endpoints, or other services co-located with the inference server. Patch to v0.11.0 immediately; if patching is not possible, enforce strict egress filtering at the network layer and disable multimodal endpoints. This is a particularly dangerous vector in cloud-hosted inference clusters where IAM credential theft via metadata service (169.254.169.254) is trivial once SSRF is achieved.

Is CVE-2025-6242 actively exploited?

Proof-of-concept exploit code is publicly available for CVE-2025-6242, increasing the risk of exploitation.

How to fix CVE-2025-6242?

1. PATCH: Upgrade vLLM to >= 0.11.0 immediately — this is the only full remediation. 2. WORKAROUND (if patching is delayed): Disable multimodal API endpoints at the reverse proxy/gateway layer; reject requests containing media URL parameters. 3. NETWORK CONTROLS: Implement egress filtering on inference server instances — block outbound access to RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) and link-local (169.254.0.0/16) from vLLM processes. Use cloud security groups or network policies to enforce this. 4. CLOUD: Enable IMDSv2 on AWS (requires session-oriented tokens, not fetchable via simple SSRF); equivalent hardening on GCP/Azure. 5. DETECTION: Alert on outbound HTTP requests from inference servers to metadata endpoints or internal subnets; log all media URL parameters in API requests for anomaly review. 6. AUDIT: Review current vLLM version across all environments including dev/staging where cloud credentials may be present.

What systems are affected by CVE-2025-6242?

This vulnerability affects the following AI/ML architecture patterns: LLM inference serving, multimodal AI pipelines, model serving, RAG pipelines with image or document processing, agent frameworks using vLLM as inference backend.

What is the CVSS score for CVE-2025-6242?

CVE-2025-6242 has a CVSS v3.1 base score of 7.1 (HIGH). The EPSS exploitation probability is 0.05%.

Technical Details

NVD Description

A Server-Side Request Forgery (SSRF) vulnerability exists in the MediaConnector class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods fetch and process media from user-provided URLs without adequate restrictions on the target hosts. This allows an attacker to coerce the vLLM server into making arbitrary requests to internal network resources.

Exploitation Scenario

An attacker with a valid (potentially free-tier or stolen) API key to a vLLM multimodal inference endpoint submits an image generation or analysis request where the media URL points to http://169.254.169.254/latest/meta-data/iam/security-credentials/. The vLLM server fetches the URL server-side via MediaConnector.load_from_url(), retrieves the AWS IAM role credentials, and the response content may be returned to the attacker directly or inferred from model behavior. With IAM credentials, the attacker escalates to S3 bucket access, other AWS services, or pivots further into the internal infrastructure. In a Kubernetes environment, the attacker could target http://kubernetes.default.svc/api/v1/secrets to harvest cluster secrets. High attack complexity reflects the need to craft the request correctly and know the target internal topology, but this is well-documented in public SSRF playbooks.