yaml: split module, raise conformance and performance by davlgd · Pull Request #27021 · vlang/v

davlgd · 2026-04-28T23:15:18Z

This PR restructures vlib/yaml, raises conformance, and lands five measurable performance wins. No public API change.

Motivation

Current module pass 53/274 tests against the public yaml-test-suite. I wanted to raise conformance to better align with yaml lib from other languages and look at ways to enhance performance. It now passes 131/274 tests.

vlib/yaml.v had grown to ~1.9 KLOC in a single file mixing public surface, parser internals, flow grammar, tree traversal and emitters, so I've splitted it.

What changed

Module layout

vlib/yaml/yaml.v (1953 lines) is split by concern into 5 files (no API change):

File	Lines	Responsibility
`yaml.v`	401	Public types (`Doc`, `Any`, `Null`) and entry points (`parse_*`, `decode/encode`, `Doc`/`Any` methods)
`parser.v`	1042	Block-style `Parser`, scalars/quoted/comments, anchors/aliases, block-scalar chomping
`flow.v`	180	`FlowParser` for `[…]` / `{…}`
`path.v`	139	Tree traversal (`value_`), dotted-key parsing, JSON↔YAML bridge
`emit.v`	202	YAML & JSON serializers

Conformance

Tabs in indentation now report the offending line number.
parse_quoted_string rejects unknown \x escapes instead of silently dropping the backslash.
peek_next_indent and skip_ignorable consult the same directives_done flag, so %-prefixed lines stop being treated as directives once the body starts.
collect_flow_continuation errors out on unterminated flow collections instead of returning silently truncated input.
value_opt now distinguishes "absent key" from "key whose value is the explicit null literal".
Decorator parsing (&anchor, *alias, !tag) extracted into a typed Decorators struct to avoid positional-binding bugs.

My yaml-test-suite runner test pass 131/131 on the targeted subset. It's intentionally not added to this PR — it requires a checkout of https://github.com/yaml/yaml-test-suite with many files. I've just added important tests of new supported cases.

What is not covered (deliberate, signaled by tests + comments): multi-document streams, explicit ? complex keys, custom tags, merge keys (<<:), full chomp/indent indicators (|2-), and a full anchors/aliases implementation.

Performance (representative ~1 KB doc, `-prod -gc boehm`)

	parse_text	to_json	to_yaml
Baseline (master)	16.2 MB/s	130 MB/s	11.9 MB/s
After this PR	22 MB/s	~350 MB/s	~30 MB/s
Gain	+36 %	+170 %	+150 %

Five separate commits, each independently measured:

to_yaml: stop calling json.encode per key — write directly into the active strings.Builder (+123 % to_yaml alone).
write_json_escaped_string: bulk-write safe runs via write_string instead of per-byte match (+170 % to_json on this fixture).
parse_scalar: skip to_lower() allocation behind a length-bounded ASCII case-insensitive compare; replace if contains_u8('_') { replace('_','') } with a single-pass strip_underscores (+30 % parse_text).
parse_quoted_string: return the body slice as-is when single-quoted strings have no '' and double-quoted strings have no \ — avoids the byte-by-byte rebuild in the common case.
JSON-superset fast path: parse JSON-shaped input directly into yaml.Any via parse_flow_value, skipping the json2.Any → from_json2 rebuild (+44 % on JSON-shaped input).

Further optimizations (lazy builder in gather_plain_continuation, 256-byte ASCII classification table for comment scanning) were prototyped and abandoned after A/B regressed — kept out of the PR.

Tests

yaml_edge_cases_test.v (27 fns, 329 lines): parser edge cases with typed assertions
yaml_conformance_test.v (19 cases, 145 lines): YAML-1.2 patterns from the spec
yaml_json_roundtrip_test.v (30 cases + 500-iter idempotency, 82 lines): guards the JSON-superset fast path and the json2.Any rebuild path that previously crashed under -prod -gc boehm
test_helpers.v: shared json_logically_eq (private, not _test.v so the helper can live in one place — V compiles each _test.v as its own binary)

All four test files green in both debug and -prod -gc boehm.

…calar

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 604cd64fb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T23:22:02Z

 	if trimmed == '' {
 		return Doc{
-			root: Any(map[string]Any{})
+			root: null


Restore empty-document compatibility for typed decode

Returning root: null for empty/whitespace input changes yaml.decode[T] behavior for non-optional struct targets: Doc.decode now feeds "null" into json.decode, which errors, whereas the previous {} root let empty config files decode to default-initialized structs. This is a user-visible regression for callers that treat an empty YAML file as “no fields set”; please preserve the old decode-compatible shape or special-case Null before json.decode.

Useful? React with 👍 / 👎.

I've fixed this and added a non-regression test

davlgd added 9 commits April 29, 2026 00:51

yaml: add conformance, edge-case and JSON round-trip tests

5b86c9e

yaml: refactor parser internals and strengthen conformance

eb81c1b

yaml: split yaml.v into per-concern files

587911d

yaml: avoid json.encode round-trip when emitting YAML strings

e8e6060

yaml: bulk-write safe runs in JSON string escaping

dfd3f3f

yaml: skip to_lower allocation and inline underscore strip in parse_s…

2007688

…calar

yaml: avoid string copy in quoted-string fast paths

397935a

yaml: parse JSON-superset input directly into yaml.Any

2dc3bb3

yaml: clean up vet notices, stale doc references, missing error prefix

604cd64

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

yaml: keep decode[T] of empty document compatible with master

9657166

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

yaml: split module, raise conformance and performance#27021

yaml: split module, raise conformance and performance#27021
davlgd wants to merge 10 commits intovlang:masterfrom
davlgd:davlgd-yaml-optim

davlgd commented Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

davlgd Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

davlgd commented Apr 28, 2026

Motivation

What changed

Module layout

Conformance

Performance (representative ~1 KB doc, -prod -gc boehm)

Tests

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

davlgd Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Performance (representative ~1 KB doc, `-prod -gc boehm`)