Skip to content

Unsafe URI and Path Handling in HTML Backend

High
dolfim-ibm published GHSA-q29v-xc37-wh5m Jun 2, 2026

Package

pip docling (pip)

Affected versions

< 2.94.0

Patched versions

2.94.0

Description

Impact

The HTML backend did not perform sufficient validation during resource handling:

  • Accepted file:// URIs enabling local file system access when enable_local_fetch=True
  • Path resolution allowed traversal outside intended directories via ../ sequences and absolute paths
  • Did not block internal network resources under enable_remote_fetch=True
  • HTTP redirects were not validated, potentially redirecting to unintended schemes
  • No resource limits for remote image downloads and data: URIs

Patches

Fixed in versions 2.91.0 (initial fixes) and 2.94.0 (additional improvements). The fixes implement:

  • Updated local path treatment: absolute files always blocked, relative paths require enable_local_fetch=True (default: False) and containment within configured base_path for path traversal protection
  • file:// scheme stripped & treated as local path (above)
  • IP address validation to prevent SSRF
  • HTTP redirect validation, connection and read timeouts
  • Size limit for both remote images (with streaming download) and base64-decoded data URIs

Workarounds

Keep both enable_local_fetch=False and enable_remote_fetch=False (defaults) when processing untrusted HTML documents.

References

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Unchanged
Confidentiality
High
Integrity
None
Availability
Low

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:N/A:L

CVE ID

CVE-2026-47214

Weaknesses

External Control of File Name or Path

The product allows user input to control or influence paths or file names that are used in filesystem operations. Learn more on MITRE.

Uncontrolled Resource Consumption

The product does not properly control the allocation and maintenance of a limited resource. Learn more on MITRE.

Credits