Skip to content

yaml: split module, raise conformance and performance#27021

Open
davlgd wants to merge 10 commits intovlang:masterfrom
davlgd:davlgd-yaml-optim
Open

yaml: split module, raise conformance and performance#27021
davlgd wants to merge 10 commits intovlang:masterfrom
davlgd:davlgd-yaml-optim

Conversation

@davlgd
Copy link
Copy Markdown
Contributor

@davlgd davlgd commented Apr 28, 2026

This PR restructures vlib/yaml, raises conformance, and lands five measurable performance wins. No public API change.

Motivation

Current module pass 53/274 tests against the public yaml-test-suite. I wanted to raise conformance to better align with yaml lib from other languages and look at ways to enhance performance. It now passes 131/274 tests.

vlib/yaml.v had grown to ~1.9 KLOC in a single file mixing public surface, parser internals, flow grammar, tree traversal and emitters, so I've splitted it.

What changed

Module layout

vlib/yaml/yaml.v (1953 lines) is split by concern into 5 files (no API change):

File Lines Responsibility
yaml.v 401 Public types (Doc, Any, Null) and entry points (parse_*, decode/encode, Doc/Any methods)
parser.v 1042 Block-style Parser, scalars/quoted/comments, anchors/aliases, block-scalar chomping
flow.v 180 FlowParser for […] / {…}
path.v 139 Tree traversal (value_), dotted-key parsing, JSON↔YAML bridge
emit.v 202 YAML & JSON serializers

Conformance

  • Tabs in indentation now report the offending line number.
  • parse_quoted_string rejects unknown \x escapes instead of silently dropping the backslash.
  • peek_next_indent and skip_ignorable consult the same directives_done flag, so %-prefixed lines stop being treated as directives once the body starts.
  • collect_flow_continuation errors out on unterminated flow collections instead of returning silently truncated input.
  • value_opt now distinguishes "absent key" from "key whose value is the explicit null literal".
  • Decorator parsing (&anchor, *alias, !tag) extracted into a typed Decorators struct to avoid positional-binding bugs.

My yaml-test-suite runner test pass 131/131 on the targeted subset. It's intentionally not added to this PR — it requires a checkout of https://github.com/yaml/yaml-test-suite with many files. I've just added important tests of new supported cases.

What is not covered (deliberate, signaled by tests + comments): multi-document streams, explicit ? complex keys, custom tags, merge keys (<<:), full chomp/indent indicators (|2-), and a full anchors/aliases implementation.

Performance (representative ~1 KB doc, -prod -gc boehm)

parse_text to_json to_yaml
Baseline (master) 16.2 MB/s 130 MB/s 11.9 MB/s
After this PR 22 MB/s ~350 MB/s ~30 MB/s
Gain +36 % +170 % +150 %

Five separate commits, each independently measured:

  1. to_yaml: stop calling json.encode per key — write directly into the active strings.Builder (+123 % to_yaml alone).
  2. write_json_escaped_string: bulk-write safe runs via write_string instead of per-byte match (+170 % to_json on this fixture).
  3. parse_scalar: skip to_lower() allocation behind a length-bounded ASCII case-insensitive compare; replace if contains_u8('_') { replace('_','') } with a single-pass strip_underscores (+30 % parse_text).
  4. parse_quoted_string: return the body slice as-is when single-quoted strings have no '' and double-quoted strings have no \ — avoids the byte-by-byte rebuild in the common case.
  5. JSON-superset fast path: parse JSON-shaped input directly into yaml.Any via parse_flow_value, skipping the json2.Anyfrom_json2 rebuild (+44 % on JSON-shaped input).

Further optimizations (lazy builder in gather_plain_continuation, 256-byte ASCII classification table for comment scanning) were prototyped and abandoned after A/B regressed — kept out of the PR.

Tests

  • yaml_edge_cases_test.v (27 fns, 329 lines): parser edge cases with typed assertions
  • yaml_conformance_test.v (19 cases, 145 lines): YAML-1.2 patterns from the spec
  • yaml_json_roundtrip_test.v (30 cases + 500-iter idempotency, 82 lines): guards the JSON-superset fast path and the json2.Any rebuild path that previously crashed under -prod -gc boehm
  • test_helpers.v: shared json_logically_eq (private, not _test.v so the helper can live in one place — V compiles each _test.v as its own binary)

All four test files green in both debug and -prod -gc boehm.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 604cd64fb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vlib/yaml/yaml.v
if trimmed == '' {
return Doc{
root: Any(map[string]Any{})
root: null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore empty-document compatibility for typed decode

Returning root: null for empty/whitespace input changes yaml.decode[T] behavior for non-optional struct targets: Doc.decode now feeds "null" into json.decode, which errors, whereas the previous {} root let empty config files decode to default-initialized structs. This is a user-visible regression for callers that treat an empty YAML file as “no fields set”; please preserve the old decode-compatible shape or special-case Null before json.decode.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this and added a non-regression test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant