yaml: split module, raise conformance and performance#27021
yaml: split module, raise conformance and performance#27021davlgd wants to merge 10 commits intovlang:masterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 604cd64fb9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if trimmed == '' { | ||
| return Doc{ | ||
| root: Any(map[string]Any{}) | ||
| root: null |
There was a problem hiding this comment.
Restore empty-document compatibility for typed decode
Returning root: null for empty/whitespace input changes yaml.decode[T] behavior for non-optional struct targets: Doc.decode now feeds "null" into json.decode, which errors, whereas the previous {} root let empty config files decode to default-initialized structs. This is a user-visible regression for callers that treat an empty YAML file as “no fields set”; please preserve the old decode-compatible shape or special-case Null before json.decode.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I've fixed this and added a non-regression test
This PR restructures
vlib/yaml, raises conformance, and lands five measurable performance wins. No public API change.Motivation
Current module pass 53/274 tests against the public yaml-test-suite. I wanted to raise conformance to better align with yaml lib from other languages and look at ways to enhance performance. It now passes 131/274 tests.
vlib/yaml.vhad grown to ~1.9 KLOC in a single file mixing public surface, parser internals, flow grammar, tree traversal and emitters, so I've splitted it.What changed
Module layout
vlib/yaml/yaml.v(1953 lines) is split by concern into 5 files (no API change):yaml.vDoc,Any,Null) and entry points (parse_*,decode/encode,Doc/Anymethods)parser.vParser, scalars/quoted/comments, anchors/aliases, block-scalar chompingflow.vFlowParserfor[…]/{…}path.vvalue_), dotted-key parsing, JSON↔YAML bridgeemit.vConformance
parse_quoted_stringrejects unknown\xescapes instead of silently dropping the backslash.peek_next_indentandskip_ignorableconsult the samedirectives_doneflag, so%-prefixed lines stop being treated as directives once the body starts.collect_flow_continuationerrors out on unterminated flow collections instead of returning silently truncated input.value_optnow distinguishes "absent key" from "key whose value is the explicitnullliteral".&anchor,*alias,!tag) extracted into a typedDecoratorsstruct to avoid positional-binding bugs.My
yaml-test-suiterunner test pass 131/131 on the targeted subset. It's intentionally not added to this PR — it requires a checkout of https://github.com/yaml/yaml-test-suite with many files. I've just added important tests of new supported cases.What is not covered (deliberate, signaled by tests + comments): multi-document streams, explicit
?complex keys, custom tags, merge keys (<<:), full chomp/indent indicators (|2-), and a full anchors/aliases implementation.Performance (representative ~1 KB doc,
-prod -gc boehm)Five separate commits, each independently measured:
to_yaml: stop callingjson.encodeper key — write directly into the activestrings.Builder(+123 %to_yamlalone).write_json_escaped_string: bulk-write safe runs viawrite_stringinstead of per-bytematch(+170 %to_jsonon this fixture).parse_scalar: skipto_lower()allocation behind a length-bounded ASCII case-insensitive compare; replaceif contains_u8('_') { replace('_','') }with a single-passstrip_underscores(+30 %parse_text).parse_quoted_string: return the body slice as-is when single-quoted strings have no''and double-quoted strings have no\— avoids the byte-by-byte rebuild in the common case.yaml.Anyviaparse_flow_value, skipping thejson2.Any→from_json2rebuild (+44 % on JSON-shaped input).Further optimizations (lazy builder in
gather_plain_continuation, 256-byte ASCII classification table for comment scanning) were prototyped and abandoned after A/B regressed — kept out of the PR.Tests
yaml_edge_cases_test.v(27 fns, 329 lines): parser edge cases with typed assertionsyaml_conformance_test.v(19 cases, 145 lines): YAML-1.2 patterns from the specyaml_json_roundtrip_test.v(30 cases + 500-iter idempotency, 82 lines): guards the JSON-superset fast path and thejson2.Anyrebuild path that previously crashed under-prod -gc boehmtest_helpers.v: sharedjson_logically_eq(private, not_test.vso the helper can live in one place — V compiles each_test.vas its own binary)All four test files green in both debug and
-prod -gc boehm.