Replace generic Node payload fields with typed Payload union#9033
Merged
Conversation
This commit adds typed Payload unions to both parse/Node.zig and canonicalize/Node.zig, enabling direct field access for node data instead of raw u32 manipulation. Changes: - parse/Node.zig: Add Payload union with typed variants for all node types - parse/NodeStore.zig: Migrate to use Payload union, reducing extra_data usage for nodes that can pack their data directly - canonicalize/Node.zig: Add Payload union with 60+ typed variants - canonicalize/NodeStore.zig: Migrate statement nodes (s_decl, s_decl_gen, s_var, s_nominal_decl, s_type_anno) to use direct field access - node_store_test.zig: Constrain random span values to fit packed format The Payload unions maintain the same 12-byte size (3 × u32) as the raw data fields, ensuring no memory overhead. Statement nodes now store optional annotations using offset encoding (0 = null, idx+1 = present) directly in node fields rather than extra_data. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated replaceExprWithNum, replaceExprWithZeroArgumentTag, replaceExprWithTuple, replaceExprWithTag to use setPayload - Updated addRecordField and addMatchBranchPattern to use setPayload - Tests pass
- Replace .raw payload access with typed .statement_import - Use field names (module_name_tok, packed_idents) instead of (data_1, data_2) - Improve code readability with clear field names and documentation - All tests passing
- Add explicit requirement: ZERO runtime memory increase (not even 1 byte) - Add explicit requirement: COMPLETE elimination of extra_data (zero remaining) - Explain why tagged unions are forbidden (memory overhead) - Define success/failure criteria precisely - Add mandatory design step before coding - Emphasize that partial migrations (with extra_data) are incomplete - Make it crystal clear this is not optional or negotiable The document now makes unambiguous what success looks like and what causes failure. Contributors cannot claim ignorance of the constraints.
…t reduced - Clarify that extra_data field will be DELETED from NodeStore in final state - NOT acceptable to keep extra_data around indefinitely - NOT acceptable to leave it in code but unused - Every node type MUST be migrated, no exceptions - Intermediate migration states can use extra_data, final state cannot - Add explicit failure criteria for extra_data field existence This makes unambiguous that the goal is complete elimination, not maintenance of extra_data as a 'compatibility' feature.
The match_branch_redundant_data SafeList was not being serialized or deserialized properly, causing index out of bounds errors when accessing match branch redundant data after deserialization. This adds the serialization and deserialization of match_branch_redundant_data in the proper order to maintain memory layout consistency.
Completed migration of statement_import from using generic extra_data buffer to using a properly typed payload struct with packed fields: - module_name_tok: u32 (Ident.Idx) - alias_tok: u32 (Ident.Idx or 0) - packed_qualifier_and_exposes: u32 (qualifier: u16, exposes_start: u11, exposes_len: u5) This removes 5 extra_data appends per import statement and simplifies the getter/setter code to use direct field access instead of offset calculations. Tests: 2233/2242 passing (no regression from previous commit)
…, expr_if_then_else, expr_zero_argument_tag to typed payloads
…n_dec_literal to use specialized int_values and dec_values lists
…use typed payloads
…se, expr_match, pattern_list, and 3 diagnostic types - Removed extra_data usage from: expr_closure, expr_if_then_else, expr_match, pattern_list - Created specialized diag_region_data list for diagnostic region storage - Migrated diag_redundant_exposed, diag_type_shadowed_warning, diag_type_parameter_conflict, diag_mutually_recursive_type_aliases to use diagnostic payload structs - Updated getters and setters for all migrated types to use typed payload field access - All migrations maintain exact 12-byte payload size (no memory overhead) - Tests passing: 2231/2242 (up from 2208) Still TODO: - Migrate: external, local, builtin node types (3 remaining with extra_data usage) - Remaining diagnostic types that don't use extra_data are unchanged
- Increased test passing rate from 98.5% to 99.5% (2231/2242 tests) - Documented 5 successfully migrated node types with detailed struct layouts - Documented infrastructure improvements including diag_region_data specialized list - Reduced extra_data references by 42% (324 → 188) - Listed remaining work with specific type categories
…ominal_external to typed payloads - Remove 7 extra_data references by using typed payload structs - Extra_data count: 195 → 188 - pattern_str_literal: store literal directly (was using raw.data_1) - pattern_small_dec_literal: store has_suffix directly (was using raw.data_3) - pattern_nominal_external: pack target_node_idx and backing_type into 16-bit halves (were in extra_data) Remaining work: 4 node types still use .raw payloads: - expr_hosted_lambda - expr_low_level - expr_dot_access - Still need to handle more complex packing or new data structures
…fn to typed payloads - e_hosted_lambda: Pack args (20+12 bits) and body+index (24+8 bits) - e_low_level_lambda: Pack args (20+12 bits) with op and body - e_type_var_dispatch: Pack args (20+12 bits) - getter was already updated, fixed setter mismatch - ty_fn: Pack args_start (20 bits), args_len (11 bits), effectful (1 bit) with ret Also fixed test constraints: - e_hosted_lambda: constrain index to u8, body to u24 - e_match: constrain branches.len to u10 - e_zero_argument_tag: constrain variant_var and ext_var to u16 - pattern_list: constrain rest_info index/pattern to u15 - s_import: constrain qualifier to u16, exposes to u11/u5 - e_type_var_dispatch: use rand_span() for args Reduced extra_data references from ~188 to 121 (~35% reduction) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
e_dot_access: Store field_name_region in diag_region_data list, pack into single u32: region_idx (12 bits), has_args (1 bit), args_start (11 bits), args_len (8 bits). This is the first node type to use auxiliary storage (diag_region_data) to fit data that exceeds 12 bytes. ty_fn: Pack args_start (20 bits), args_len (11 bits), effectful (1 bit) into single u32 with ret in second slot. extra_data references reduced from 121 to 111 (~8% reduction this commit) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ttern Both types now use a 2-bit tag + 30-bit data pattern for the LocalOrExternal discriminant: - tag 0 (builtin): builtin_type in low 8 bits - tag 1 (local): decl_idx in low 30 bits - tag 2 (external): module_idx (14 bits) + target_node_idx (16 bits) This replaces the separate ty_apply_external variant and extra_data usage. Also fixes test constraints for module_idx (u14) to match packed bit widths. Reduces extra_data references from 97 to 87. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. Fix replaceExprWithNum and replaceExprWithZeroArgumentTag to set node.tag - These functions were setting payload but not updating the tag - This caused "invalid enum value" errors when reading replaced nodes 2. Fix e_dot_access bit packing to avoid overflow: - region_idx: 10 bits (was 12) - max 1K entries - has_args: 1 bit - args_start: 16 bits (was 11) - max 64K entries - args_len: 5 bits (was 8) - max 31 args (plenty for method calls) These fixes reduce fx_platform_test failures from 18 to 13. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update pattern_nominal, pattern_nominal_external to use typed payloads - Update pattern_f32_literal, pattern_f64_literal to use typed payloads - Update pattern_dec_literal, pattern_small_dec_literal to use typed payloads - Update pattern_str_literal to use typed payload - 69 remaining data_* usages (down from 83) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…sage - Add TyRigidVarLookup payload type - Update ty_rigid_var, ty_rigid_var_lookup, ty_parens getters - Update getTypeHeader to use typed payload - Update getAnnoRecordField to use ty_record_field payload - Update getExposedItem to use exposed_item payload - All payload.raw usages in getters are now converted! - 58 remaining data_* usages (node initialization, not payload access) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove Payload.Raw variant from union (no longer needed) - Remove Raw struct definition - Update test to use typed payload instead of raw - All getters and setters now use typed payloads with semantic field names - Zero runtime change - same bytes, better names Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TypeVarSlot payload for type_var_slot nodes - Convert addIfBranch, addTypeVarSlot, fillInTypeVarSlotsThru to use Node.init() and setPayload() - Update test code to use typed payloads instead of raw data_* fields - Convert remaining usages in eval/comptime_evaluator.zig and eval/test/helpers.zig to use Node.init() Zero data_1, data_2, data_3 usages remain in any .zig source files. All 2342 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Node struct now stores `payload: Payload` directly instead of three generic u32 fields (payload_0/payload_1/payload_2, formerly data_1/data_2/data_3). - getPayload() now simply returns self.payload - setPayload() now simply assigns self.payload = p - init() uses std.mem.zeroes(Payload) All generic field names are now completely eliminated from the codebase. The typed Payload union provides semantic field names for each node type. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix expr_for to use its own payload type instead of statement_for - Replace generic Diagnostic payload (primary/secondary/tertiary) with 8 typed variants: - DiagEmpty, DiagSingleIdent, DiagSingleValue, DiagTwoIdents - DiagIdentWithRegion, DiagTwoIdentsExtra, DiagSingleIdentExtra, DiagTwoEnums - Add int128_values: SafeList(i128) for typed storage of large numeric literals - Migrate 6 node types from extra_data to int128_values: - expr_num, expr_dec, expr_typed_int, expr_typed_frac - pattern_num_literal, pattern_dec_literal - Delete outdated REVIEW_NOTES.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace generic extra_data storage with typed lists for better type safety:
- span2_data: SafeList(Span2) for (start, len) span pairs
- Migrated: expr_call, expr_type_var_dispatch
- span_with_node_data: SafeList(SpanWithNode) for (start, len, node) triples
- Migrated: expr_lambda, expr_hosted_lambda, expr_low_level,
where_method, expr_record, expr_if_then_else
- match_data: SafeList(MatchData) for match expressions (5 fields)
- Migrated: expr_match
- match_branch_data: SafeList(MatchBranchData) for match branches (5 fields)
- Migrated: match_branch
All 2342 tests pass.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Continue replacing generic extra_data storage: - closure_data: SafeList(ClosureData) for closure expressions - Migrated: expr_closure, replaceExprWithZeroArgumentTag - zero_arg_tag_data: SafeList(ZeroArgTagData) for zero-argument tags - Migrated: expr_zero_argument_tag All 2342 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Continue replacing generic extra_data storage: - def_data: SafeList(DefData) for definition nodes - Migrated: addDef, getDef, setDefExpr - Updated: builtin_compiler/main.zig, HostedCompiler.zig All 2342 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reuse existing span2_data list for backing data: - expr_nominal_external: (backing_expr, backing_type) - pattern_nominal_external: (backing_pattern, backing_type) All 2342 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reuse existing span_with_node_data list for: - expr_dot_access: (region_start, region_end, packed_args) All 2342 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reuse existing span2_data list for: - record_destruct: (kind_tag, pattern_idx) All 2342 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- s_decl, s_decl_gen, s_var: Use span2_data for optional annotations - ty_fn: Use span2_data for (effectful, ret) pair - annotation: Use span2_data for where clause span - Diagnostic payloads: Migrate redundant_exposed, type_shadowed_warning, type_parameter_conflict, and mutually_recursive_type_aliases to use span2_data for region storage Reduces extra_data.append calls from 47 to 35. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- s_nominal_decl: Store is_opaque directly in payload (no extra_data) - s_type_anno: Store anno, name in payload, where_clause in span2_data - s_import: Add ImportData typed list for alias_tok, qualifier_tok, flags, exposes span Reduces extra_data.append calls from 35 to 23. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- ty_lookup: Use span2_data for base value(s) - ty_apply: Add TypeApplyData typed list for args_len, base_tag, and values Adds type_apply_data typed list. Reduces extra_data.append calls from 23 to 13. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- pattern_list: Add PatternListData typed list for rest pattern info - Add index_data typed list for variable-length index arrays - Update replaceExprWithTuple, replaceExprWithTag, spanFrom to use index_data - Update sliceFromSpan, firstFromSpan, lastFromSpan, matchBranchSlice Now extra_data is only used for deserialization test padding. Reduces extra_data.append calls from 13 to 2 (test-only). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete extra_data: SafeList(u32) from NodeStore (struct, init, deinit, relocate, serialization) - Remove unused legacy expr_int tag and ExprInt payload from Node.zig - Update comptime_evaluator.zig to use index_data instead of extra_data - Fix serialization tests to use expr_num with int128_values - Update stale comments that mentioned extra_data The generic extra_data storage is now fully replaced by typed lists: - int128_values: SafeList(i128) - numeric literals - span2_data: SafeList(Span2) - 2-field spans - span_with_node_data: SafeList(SpanWithNode) - 3-field spans - pattern_list_data: SafeList(PatternListData) - pattern lists - match_data: SafeList(MatchData) - match expressions - match_branch_data: SafeList(MatchBranchData) - match branches - index_data: SafeList(u32) - variable-length index arrays - import_data: SafeList(ImportData) - import statements Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ModuleEnv.Serialized: 1160 -> 1424 bytes NodeStore.Serialized: 96 -> 360 bytes The increase is due to replacing 1 generic SafeList(u32) with 8 typed SafeLists for better type safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The i128 type requires 16-byte alignment, which caused deserialization failures when serialized data wasn't properly aligned. This was causing panics with "incorrect alignment" on Linux CI. Solution: Replace SafeList(i128) with SafeList(Int128Storage) where Int128Storage is an extern struct containing two u64 fields. This only requires 8-byte alignment while still storing 128-bit values. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…per" This reverts commit 83b3df8.
The i128 type requires 16-byte alignment. By serializing int128_values FIRST (after the header), its data will be at a 16-byte aligned offset since: 1. The buffer is allocated with 16-byte alignment 2. SafeList serialization pads to @Alignof(T) before writing data The deserialization buffer is already allocated with 16-byte alignment via SERIALIZATION_ALIGNMENT in CompactWriter. This is the proper fix that preserves zero-cost deserialization (just cast the memory) rather than adding runtime conversion overhead. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
57b4da9 to
a6ca8cb
Compare
… with typed payload
The serialized data blob contains i128 values which require 16-byte alignment. Previously the LLVM global was set to 8-byte alignment, causing "incorrect alignment" panics on Linux when deserializing. This is the proper fix - tell LLVM to align the embedded data to 16 bytes. Zero runtime cost, just a linker directive. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use a proper bool type instead of u32 with 0/1 values for has_suffix fields. The structs still maintain their 12-byte size via explicit padding bytes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change all `_unused: u32` / `_unused1: u32` / `_unused2: u32` fields to `_padding: [N]u8` arrays with default zero values - Remove explicit `._unused = 0` assignments from setPayload calls since padding fields now have defaults - Makes struct definitions clearer about what is padding vs actual data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces generic field names with typed, semantic alternatives in the canonicalization layer (
src/canonicalize/). The parse layer (src/parse/) is not affected by this PR.Node Payload Changes
data_1,data_2,data_3fields in Node struct with a single typedpayload: PayloadfieldPayloadextern union provides semantic field names for each node type (e.g.,.if_branch.condinstead of.data_1)Extra Data Migration
extra_data: SafeList(u32)from NodeStore entirelyint128_values: SafeList(i128)- numeric literalsspan2_data: SafeList(Span2)- 2-field spansspan_with_node_data: SafeList(SpanWithNode)- 3-field spanspattern_list_data: SafeList(PatternListData)- pattern listsmatch_data: SafeList(MatchData)- match expressionsmatch_branch_data: SafeList(MatchBranchData)- match branchesindex_data: SafeList(u32)- variable-length index arraysimport_data: SafeList(ImportData)- import statementsDiagnostic Payloads
Diagnosticpayload with 8 typed variants based on actual usage patternsAll 2342 tests pass.
🤖 Generated with Claude Code