Skip to content

Support decimal word parsing separately from integer ToNumber #1739

@clairernovotny

Description

@clairernovotny

Problem

ToNumber / TryToNumber currently parse localized number words into long, so decimal word phrases cannot be represented by the existing API.

Examples that should eventually be supported by a decimal-returning API:

  • English: "one point two" -> 1.2
  • Hindi: "एक दशमलव दो" -> 1.2
  • Urdu: "ایک اعشاریہ دو" -> 1.2

Current behavior

The words-to-number stack is integer-shaped:

  • WordsToNumberExtension.ToNumber(...) returns long
  • TryToNumber(...) writes out long
  • IWordsToNumberConverter.Convert(...) returns long
  • shared parsers reduce token phrases into integer sums/scales

That means decimal phrases should currently be rejected rather than parsed through ToNumber.

PR #1738 removed Hindi/Urdu decimal marker words from ignoredTokens so unsupported decimal phrases fail instead of being silently converted to the wrong integer.

Proposed direction

Add explicit decimal word parsing as a separate feature/API, for example:

  • ToDecimalNumber(...) / TryToDecimalNumber(..., out decimal value)
  • or a parallel IWordsToDecimalNumberConverter contract
  • locale data for decimal marker tokens, distinct from filler/conjunction ignoredTokens
  • tests for digit-by-digit fractional parsing, leading/trailing fractional zeroes, negatives, malformed phrases, and locale-specific decimal marker words

This should not change the return type or semantics of existing ToNumber, because that would be a breaking API change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions