Skip to content

Docling: Unsafe Archive Extraction and XML Parsing in METS-GBS Backend

Moderate severity GitHub Reviewed Published Jun 2, 2026 in docling-project/docling • Updated Jun 3, 2026

Package

pip docling (pip)

Affected versions

>= 2.45.0, < 2.91.0

Patched versions

2.91.0

Description

Impact

The METS-GBS backend's XML parsing and the input document format detection lacked security controls, enabling:

  • XML External Entity (XXE) attacks to read local files or cause denial of service
  • Decompression bombs (zip bombs) to exhaust memory and disk space
  • Unbounded archive extraction consuming system resources

An attacker could craft malicious METS-GBS archives that, when processed, could read sensitive files, exhaust system resources, or cause application crashes.

Patches

Fixed in version 2.91.0. The fix implements:

  • Secure XML parsing with resolve_entities=False, load_dtd=False, and no_network=True
  • Configurable limits: 300 MB total extraction size, 10 MB per file, 1000 member count
  • Cumulative size tracking across all extractions
  • Early termination when limits are exceeded
  • Secure format detection of METS-GBS tar archives with _detect_mets_gbs() method: maximum file size (10 MB per file), maximum member count (1000 members), and exception handling to gracefully fail when limits are exceeded

Workarounds

Avoid processing METS-GBS archives from untrusted sources. If necessary, pre-validate archives in an isolated environment with resource limits.

References

References

@dolfim-ibm dolfim-ibm published to docling-project/docling Jun 2, 2026
Published to the GitHub Advisory Database Jun 3, 2026
Reviewed Jun 3, 2026
Last updated Jun 3, 2026

Severity

Moderate

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Local
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Unchanged
Confidentiality
None
Integrity
None
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H

EPSS score

Exploit Prediction Scoring System (EPSS)

This score estimates the probability of this vulnerability being exploited within the next 30 days. Data provided by FIRST.
(3rd percentile)

Weaknesses

Improper Handling of Highly Compressed Data (Data Amplification)

The product does not handle or incorrectly handles a compressed input with a very high compression ratio that produces a large output. Learn more on MITRE.

Improper Restriction of XML External Entity Reference

The product processes an XML document that can contain XML entities with URIs that resolve to documents outside of the intended sphere of control, causing the product to embed incorrect documents into its output. Learn more on MITRE.

Improper Restriction of Recursive Entity References in DTDs ('XML Entity Expansion')

The product uses XML documents and allows their structure to be defined with a Document Type Definition (DTD), but it does not properly control the number of recursive definitions of entities. Learn more on MITRE.

CVE ID

CVE-2026-44018

GHSA ID

GHSA-r3xg-rg9j-67fv

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.