Skip to content

(security) Fix SSRF in batch runner download_bytes_from_url#38482

Merged
DarkLight1337 merged 1 commit into
vllm-project:mainfrom
jperezdealgaba:fix/batch-ssrf-download-url
Mar 30, 2026
Merged

(security) Fix SSRF in batch runner download_bytes_from_url#38482
DarkLight1337 merged 1 commit into
vllm-project:mainfrom
jperezdealgaba:fix/batch-ssrf-download-url

Conversation

@jperezdealgaba

Copy link
Copy Markdown
Contributor

Fix an SSRF (Server-Side Request Forgery) vulnerability in the batch runner's download_bytes_from_url function (vllm/entrypoints/openai/run_batch.py).

The file_url field in batch transcription/translation requests (BatchTranscriptionRequest, BatchTranslationRequest) was passed directly to aiohttp.ClientSession().get() without any hostname or domain validation. This allowed anyone who could control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server (e.g. targeting cloud metadata endpoints like 169.254.169.254, or internal HTTP APIs).

The online serving path (MediaConnector) already validates URLs against --allowed-media-domains, but download_bytes_from_url did not reuse that protection. This patch closes the gap by:

  1. Adding an allowed_media_domains parameter to download_bytes_from_url that validates the URL's hostname against the allowlist before making any HTTP request. Uses urllib3.util.parse_url (consistent with MediaConnector) and normalizes the URL to prevent parsing-discrepancy bypasses (e.g. backslash-@ attacks).
  2. Threading allowed_media_domains from the CLI args (--allowed-media-domains) through make_transcription_wrapper and build_endpoint_registry into download_bytes_from_url.
  3. Respecting VLLM_MEDIA_URL_ALLOW_REDIRECTS for HTTP redirect control (previously redirects were always followed).
  4. Updating docs/usage/security.md to document that the batch runner is also covered by --allowed-media-domains.

data: URLs remain exempt from domain restrictions (they don't make network requests). When no allowlist is configured, behavior is unchanged (backward compatible).

Test Plan

python -m pytest tests/entrypoints/openai/test_run_batch.py -v -k "test_download_bytes" --timeout=30

9 unit tests added covering:

  • data: URLs bypass domain restrictions
  • Disallowed domains are rejected
  • Cloud metadata IP (169.254.169.254) is blocked
  • Private-range IPs (10.x, 192.168.x, 127.x) are blocked
  • Allowlisted domains are fetched successfully
  • No allowlist (None) permits all domains (backward compat)
  • Empty allowlist ([]) permits all domains
  • Unsupported URL schemes are rejected
  • Backslash-@ URL confusion cannot bypass the allowlist

Test Result

tests/entrypoints/openai/test_run_batch.py::test_download_bytes_data_url_bypasses_domain_check PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_rejects_disallowed_domain PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_rejects_cloud_metadata_ip PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_rejects_internal_ip PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_allows_permitted_domain PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_no_allowlist_permits_any_domain PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_empty_allowlist_permits_any_domain PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_unsupported_scheme PASSED
tests/entrypoints/openai/test_run_batch.py::test_download_bytes_backslash_bypass PASSED
================= 9 passed, 11 deselected, 2 warnings in 1.05s =================

All pre-commit hooks pass (ruff check, ruff format, typos, markdownlint, mypy, SPDX headers, and all project-specific checks).

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify

mergify Bot commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

Documentation preview: https://vllm--38482.org.readthedocs.build/en/38482/

@mergify mergify Bot added documentation Improvements or additions to documentation frontend labels Mar 29, 2026
@jperezdealgaba jperezdealgaba force-pushed the fix/batch-ssrf-download-url branch from 7878ce9 to d4694bf Compare March 29, 2026 16:25

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends SSRF protection to the batch runner by validating media URLs against an allowed domain list and normalizing them to prevent parsing-based bypasses. The changes include documentation, comprehensive security tests, and updates to the transcription/translation wrappers. Feedback points out that an empty allowlist currently permits all domains due to a truthiness check; it is recommended to explicitly check for None so that an empty list correctly denies all requests.

Comment thread vllm/entrypoints/openai/run_batch.py Outdated
The `file_url` field in batch transcription/translation requests was
passed directly to aiohttp without any hostname validation, allowing
SSRF attacks against internal services (e.g. cloud metadata endpoints).
Add domain validation to `download_bytes_from_url` using the existing
`--allowed-media-domains` allowlist, consistent with MediaConnector.
Normalize URLs through urllib3 to prevent parsing-discrepancy bypasses
and respect `VLLM_MEDIA_URL_ALLOW_REDIRECTS` for redirect control.
Signed-off-by: Juan Perez de Algaba <jperezdealgaba@redhat.com>

Signed-off-by: jperezde <jperezde@redhat.com>
@jperezdealgaba jperezdealgaba force-pushed the fix/batch-ssrf-download-url branch from d4694bf to aa3d773 Compare March 29, 2026 16:30
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 30, 2026
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 30, 2026 04:15
@DarkLight1337 DarkLight1337 merged commit 57861ae into vllm-project:main Mar 30, 2026
52 of 53 checks passed
neweyes pushed a commit to neweyes/vllm that referenced this pull request Mar 31, 2026
…ject#38482)

Signed-off-by: jperezde <jperezde@redhat.com>
Signed-off-by: neweyes <328719365@qq.com>
puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026
…ject#38482)

Signed-off-by: jperezde <jperezde@redhat.com>
Signed-off-by: Rishi Puri <riship@nvidia.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…ject#38482)

Signed-off-by: jperezde <jperezde@redhat.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants