Skip to content

Replace generic Node payload fields with typed Payload union#9033

Merged
rtfeldman merged 91 commits into
mainfrom
remove-extra-data
Jan 18, 2026
Merged

Replace generic Node payload fields with typed Payload union#9033
rtfeldman merged 91 commits into
mainfrom
remove-extra-data

Conversation

@rtfeldman

@rtfeldman rtfeldman commented Jan 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces generic field names with typed, semantic alternatives in the canonicalization layer (src/canonicalize/). The parse layer (src/parse/) is not affected by this PR.

Node Payload Changes

  • Replaces generic data_1, data_2, data_3 fields in Node struct with a single typed payload: Payload field
  • The Payload extern union provides semantic field names for each node type (e.g., .if_branch.cond instead of .data_1)
  • Node struct remains 16 bytes, Payload remains 12 bytes

Extra Data Migration

  • Removes the generic extra_data: SafeList(u32) from NodeStore entirely
  • Replaced with 8 typed SafeLists for better type safety:
    • int128_values: SafeList(i128) - numeric literals
    • span2_data: SafeList(Span2) - 2-field spans
    • span_with_node_data: SafeList(SpanWithNode) - 3-field spans
    • pattern_list_data: SafeList(PatternListData) - pattern lists
    • match_data: SafeList(MatchData) - match expressions
    • match_branch_data: SafeList(MatchBranchData) - match branches
    • index_data: SafeList(u32) - variable-length index arrays
    • import_data: SafeList(ImportData) - import statements

Diagnostic Payloads

  • Replaced generic Diagnostic payload with 8 typed variants based on actual usage patterns
  • Each diagnostic type now uses a semantically appropriate payload struct

All 2342 tests pass.

🤖 Generated with Claude Code

rtfeldman and others added 30 commits January 1, 2026 07:54
This commit adds typed Payload unions to both parse/Node.zig and
canonicalize/Node.zig, enabling direct field access for node data
instead of raw u32 manipulation.

Changes:
- parse/Node.zig: Add Payload union with typed variants for all node types
- parse/NodeStore.zig: Migrate to use Payload union, reducing extra_data
  usage for nodes that can pack their data directly
- canonicalize/Node.zig: Add Payload union with 60+ typed variants
- canonicalize/NodeStore.zig: Migrate statement nodes (s_decl, s_decl_gen,
  s_var, s_nominal_decl, s_type_anno) to use direct field access
- node_store_test.zig: Constrain random span values to fit packed format

The Payload unions maintain the same 12-byte size (3 × u32) as the raw
data fields, ensuring no memory overhead. Statement nodes now store
optional annotations using offset encoding (0 = null, idx+1 = present)
directly in node fields rather than extra_data.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated replaceExprWithNum, replaceExprWithZeroArgumentTag, replaceExprWithTuple, replaceExprWithTag to use setPayload
- Updated addRecordField and addMatchBranchPattern to use setPayload
- Tests pass
- Replace .raw payload access with typed .statement_import
- Use field names (module_name_tok, packed_idents) instead of (data_1, data_2)
- Improve code readability with clear field names and documentation
- All tests passing
- Add explicit requirement: ZERO runtime memory increase (not even 1 byte)
- Add explicit requirement: COMPLETE elimination of extra_data (zero remaining)
- Explain why tagged unions are forbidden (memory overhead)
- Define success/failure criteria precisely
- Add mandatory design step before coding
- Emphasize that partial migrations (with extra_data) are incomplete
- Make it crystal clear this is not optional or negotiable

The document now makes unambiguous what success looks like and what
causes failure. Contributors cannot claim ignorance of the constraints.
…t reduced

- Clarify that extra_data field will be DELETED from NodeStore in final state
- NOT acceptable to keep extra_data around indefinitely
- NOT acceptable to leave it in code but unused
- Every node type MUST be migrated, no exceptions
- Intermediate migration states can use extra_data, final state cannot
- Add explicit failure criteria for extra_data field existence

This makes unambiguous that the goal is complete elimination,
not maintenance of extra_data as a 'compatibility' feature.
The match_branch_redundant_data SafeList was not being serialized or
deserialized properly, causing index out of bounds errors when accessing
match branch redundant data after deserialization.

This adds the serialization and deserialization of match_branch_redundant_data
in the proper order to maintain memory layout consistency.
Completed migration of statement_import from using generic extra_data buffer
to using a properly typed payload struct with packed fields:

- module_name_tok: u32 (Ident.Idx)
- alias_tok: u32 (Ident.Idx or 0)
- packed_qualifier_and_exposes: u32 (qualifier: u16, exposes_start: u11, exposes_len: u5)

This removes 5 extra_data appends per import statement and simplifies
the getter/setter code to use direct field access instead of offset calculations.

Tests: 2233/2242 passing (no regression from previous commit)
…, expr_if_then_else, expr_zero_argument_tag to typed payloads
…n_dec_literal to use specialized int_values and dec_values lists
…se, expr_match, pattern_list, and 3 diagnostic types

- Removed extra_data usage from: expr_closure, expr_if_then_else, expr_match, pattern_list
- Created specialized diag_region_data list for diagnostic region storage
- Migrated diag_redundant_exposed, diag_type_shadowed_warning, diag_type_parameter_conflict, diag_mutually_recursive_type_aliases to use diagnostic payload structs
- Updated getters and setters for all migrated types to use typed payload field access
- All migrations maintain exact 12-byte payload size (no memory overhead)
- Tests passing: 2231/2242 (up from 2208)

Still TODO:
- Migrate: external, local, builtin node types (3 remaining with extra_data usage)
- Remaining diagnostic types that don't use extra_data are unchanged
- Increased test passing rate from 98.5% to 99.5% (2231/2242 tests)
- Documented 5 successfully migrated node types with detailed struct layouts
- Documented infrastructure improvements including diag_region_data specialized list
- Reduced extra_data references by 42% (324 → 188)
- Listed remaining work with specific type categories
…ominal_external to typed payloads

- Remove 7 extra_data references by using typed payload structs
- Extra_data count: 195 → 188
- pattern_str_literal: store literal directly (was using raw.data_1)
- pattern_small_dec_literal: store has_suffix directly (was using raw.data_3)
- pattern_nominal_external: pack target_node_idx and backing_type into 16-bit halves (were in extra_data)

Remaining work: 4 node types still use .raw payloads:
- expr_hosted_lambda
- expr_low_level
- expr_dot_access
- Still need to handle more complex packing or new data structures
…fn to typed payloads

- e_hosted_lambda: Pack args (20+12 bits) and body+index (24+8 bits)
- e_low_level_lambda: Pack args (20+12 bits) with op and body
- e_type_var_dispatch: Pack args (20+12 bits) - getter was already updated, fixed setter mismatch
- ty_fn: Pack args_start (20 bits), args_len (11 bits), effectful (1 bit) with ret

Also fixed test constraints:
- e_hosted_lambda: constrain index to u8, body to u24
- e_match: constrain branches.len to u10
- e_zero_argument_tag: constrain variant_var and ext_var to u16
- pattern_list: constrain rest_info index/pattern to u15
- s_import: constrain qualifier to u16, exposes to u11/u5
- e_type_var_dispatch: use rand_span() for args

Reduced extra_data references from ~188 to 121 (~35% reduction)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
e_dot_access: Store field_name_region in diag_region_data list, pack into
single u32: region_idx (12 bits), has_args (1 bit), args_start (11 bits),
args_len (8 bits). This is the first node type to use auxiliary storage
(diag_region_data) to fit data that exceeds 12 bytes.

ty_fn: Pack args_start (20 bits), args_len (11 bits), effectful (1 bit)
into single u32 with ret in second slot.

extra_data references reduced from 121 to 111 (~8% reduction this commit)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ttern

Both types now use a 2-bit tag + 30-bit data pattern for the LocalOrExternal
discriminant:
- tag 0 (builtin): builtin_type in low 8 bits
- tag 1 (local): decl_idx in low 30 bits
- tag 2 (external): module_idx (14 bits) + target_node_idx (16 bits)

This replaces the separate ty_apply_external variant and extra_data usage.

Also fixes test constraints for module_idx (u14) to match packed bit widths.

Reduces extra_data references from 97 to 87.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. Fix replaceExprWithNum and replaceExprWithZeroArgumentTag to set node.tag
   - These functions were setting payload but not updating the tag
   - This caused "invalid enum value" errors when reading replaced nodes

2. Fix e_dot_access bit packing to avoid overflow:
   - region_idx: 10 bits (was 12) - max 1K entries
   - has_args: 1 bit
   - args_start: 16 bits (was 11) - max 64K entries
   - args_len: 5 bits (was 8) - max 31 args (plenty for method calls)

These fixes reduce fx_platform_test failures from 18 to 13.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
rtfeldman and others added 23 commits January 15, 2026 13:33
- Update pattern_nominal, pattern_nominal_external to use typed payloads
- Update pattern_f32_literal, pattern_f64_literal to use typed payloads
- Update pattern_dec_literal, pattern_small_dec_literal to use typed payloads
- Update pattern_str_literal to use typed payload
- 69 remaining data_* usages (down from 83)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…sage

- Add TyRigidVarLookup payload type
- Update ty_rigid_var, ty_rigid_var_lookup, ty_parens getters
- Update getTypeHeader to use typed payload
- Update getAnnoRecordField to use ty_record_field payload
- Update getExposedItem to use exposed_item payload
- All payload.raw usages in getters are now converted!
- 58 remaining data_* usages (node initialization, not payload access)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove Payload.Raw variant from union (no longer needed)
- Remove Raw struct definition
- Update test to use typed payload instead of raw
- All getters and setters now use typed payloads with semantic field names
- Zero runtime change - same bytes, better names

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TypeVarSlot payload for type_var_slot nodes
- Convert addIfBranch, addTypeVarSlot, fillInTypeVarSlotsThru to use
  Node.init() and setPayload()
- Update test code to use typed payloads instead of raw data_* fields
- Convert remaining usages in eval/comptime_evaluator.zig and
  eval/test/helpers.zig to use Node.init()

Zero data_1, data_2, data_3 usages remain in any .zig source files.
All 2342 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Node struct now stores `payload: Payload` directly instead of
three generic u32 fields (payload_0/payload_1/payload_2, formerly
data_1/data_2/data_3).

- getPayload() now simply returns self.payload
- setPayload() now simply assigns self.payload = p
- init() uses std.mem.zeroes(Payload)

All generic field names are now completely eliminated from the codebase.
The typed Payload union provides semantic field names for each node type.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix expr_for to use its own payload type instead of statement_for
- Replace generic Diagnostic payload (primary/secondary/tertiary) with 8 typed variants:
  - DiagEmpty, DiagSingleIdent, DiagSingleValue, DiagTwoIdents
  - DiagIdentWithRegion, DiagTwoIdentsExtra, DiagSingleIdentExtra, DiagTwoEnums
- Add int128_values: SafeList(i128) for typed storage of large numeric literals
- Migrate 6 node types from extra_data to int128_values:
  - expr_num, expr_dec, expr_typed_int, expr_typed_frac
  - pattern_num_literal, pattern_dec_literal
- Delete outdated REVIEW_NOTES.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace generic extra_data storage with typed lists for better type safety:

- span2_data: SafeList(Span2) for (start, len) span pairs
  - Migrated: expr_call, expr_type_var_dispatch

- span_with_node_data: SafeList(SpanWithNode) for (start, len, node) triples
  - Migrated: expr_lambda, expr_hosted_lambda, expr_low_level,
    where_method, expr_record, expr_if_then_else

- match_data: SafeList(MatchData) for match expressions (5 fields)
  - Migrated: expr_match

- match_branch_data: SafeList(MatchBranchData) for match branches (5 fields)
  - Migrated: match_branch

All 2342 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Continue replacing generic extra_data storage:

- closure_data: SafeList(ClosureData) for closure expressions
  - Migrated: expr_closure, replaceExprWithZeroArgumentTag

- zero_arg_tag_data: SafeList(ZeroArgTagData) for zero-argument tags
  - Migrated: expr_zero_argument_tag

All 2342 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Continue replacing generic extra_data storage:

- def_data: SafeList(DefData) for definition nodes
  - Migrated: addDef, getDef, setDefExpr
  - Updated: builtin_compiler/main.zig, HostedCompiler.zig

All 2342 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reuse existing span2_data list for backing data:
- expr_nominal_external: (backing_expr, backing_type)
- pattern_nominal_external: (backing_pattern, backing_type)

All 2342 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reuse existing span_with_node_data list for:
- expr_dot_access: (region_start, region_end, packed_args)

All 2342 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reuse existing span2_data list for:
- record_destruct: (kind_tag, pattern_idx)

All 2342 tests pass.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- s_decl, s_decl_gen, s_var: Use span2_data for optional annotations
- ty_fn: Use span2_data for (effectful, ret) pair
- annotation: Use span2_data for where clause span
- Diagnostic payloads: Migrate redundant_exposed, type_shadowed_warning,
  type_parameter_conflict, and mutually_recursive_type_aliases to use
  span2_data for region storage

Reduces extra_data.append calls from 47 to 35.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- s_nominal_decl: Store is_opaque directly in payload (no extra_data)
- s_type_anno: Store anno, name in payload, where_clause in span2_data
- s_import: Add ImportData typed list for alias_tok, qualifier_tok, flags, exposes span

Reduces extra_data.append calls from 35 to 23.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- ty_lookup: Use span2_data for base value(s)
- ty_apply: Add TypeApplyData typed list for args_len, base_tag, and values

Adds type_apply_data typed list.
Reduces extra_data.append calls from 23 to 13.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- pattern_list: Add PatternListData typed list for rest pattern info
- Add index_data typed list for variable-length index arrays
- Update replaceExprWithTuple, replaceExprWithTag, spanFrom to use index_data
- Update sliceFromSpan, firstFromSpan, lastFromSpan, matchBranchSlice

Now extra_data is only used for deserialization test padding.
Reduces extra_data.append calls from 13 to 2 (test-only).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete extra_data: SafeList(u32) from NodeStore (struct, init, deinit,
  relocate, serialization)
- Remove unused legacy expr_int tag and ExprInt payload from Node.zig
- Update comptime_evaluator.zig to use index_data instead of extra_data
- Fix serialization tests to use expr_num with int128_values
- Update stale comments that mentioned extra_data

The generic extra_data storage is now fully replaced by typed lists:
- int128_values: SafeList(i128) - numeric literals
- span2_data: SafeList(Span2) - 2-field spans
- span_with_node_data: SafeList(SpanWithNode) - 3-field spans
- pattern_list_data: SafeList(PatternListData) - pattern lists
- match_data: SafeList(MatchData) - match expressions
- match_branch_data: SafeList(MatchBranchData) - match branches
- index_data: SafeList(u32) - variable-length index arrays
- import_data: SafeList(ImportData) - import statements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ModuleEnv.Serialized: 1160 -> 1424 bytes
NodeStore.Serialized: 96 -> 360 bytes

The increase is due to replacing 1 generic SafeList(u32) with 8 typed
SafeLists for better type safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The i128 type requires 16-byte alignment, which caused deserialization
failures when serialized data wasn't properly aligned. This was causing
panics with "incorrect alignment" on Linux CI.

Solution: Replace SafeList(i128) with SafeList(Int128Storage) where
Int128Storage is an extern struct containing two u64 fields. This only
requires 8-byte alignment while still storing 128-bit values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The i128 type requires 16-byte alignment. By serializing int128_values
FIRST (after the header), its data will be at a 16-byte aligned offset
since:
1. The buffer is allocated with 16-byte alignment
2. SafeList serialization pads to @Alignof(T) before writing data

The deserialization buffer is already allocated with 16-byte alignment
via SERIALIZATION_ALIGNMENT in CompactWriter.

This is the proper fix that preserves zero-cost deserialization (just
cast the memory) rather than adding runtime conversion overhead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
rtfeldman and others added 4 commits January 17, 2026 12:59
The serialized data blob contains i128 values which require 16-byte
alignment. Previously the LLVM global was set to 8-byte alignment,
causing "incorrect alignment" panics on Linux when deserializing.

This is the proper fix - tell LLVM to align the embedded data to 16
bytes. Zero runtime cost, just a linker directive.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use a proper bool type instead of u32 with 0/1 values for has_suffix
fields. The structs still maintain their 12-byte size via explicit
padding bytes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change all `_unused: u32` / `_unused1: u32` / `_unused2: u32` fields to
  `_padding: [N]u8` arrays with default zero values
- Remove explicit `._unused = 0` assignments from setPayload calls since
  padding fields now have defaults
- Makes struct definitions clearer about what is padding vs actual data

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rtfeldman rtfeldman merged commit 45aab43 into main Jan 18, 2026
37 checks passed
@rtfeldman rtfeldman deleted the remove-extra-data branch January 18, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant