You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(delta): translate column-mapping names on the pre-materialised-index path
`buildTaskListFromAddFiles` (TahoeBatchFileIndex / CdcAddFileIndex /
TahoeRemoveFileIndex / TahoeChangeFileIndex) was emitting AddFile
partition values keyed by the PHYSICAL column name and skipping
`column_mappings` entirely -- both of which the kernel path already
handles. For tables using column mapping id / name mode, CDC reads
therefore lost both the real partition column value and all non-
partition data columns (read as null under logical names the parquet
file didn't have).
Two companion changes:
1. `buildTaskListFromAddFiles` takes an optional `physicalToLogical`
partition-name map; the TahoeBatchFileIndex callsite derives it
from `relation.partitionSchema` field metadata
(`delta.columnMapping.physicalName`) before dispatch.
2. Where kernel didn't populate `column_mappings` (i.e. the
matchingFiles path), re-derive the logical->physical list from
`relation.dataSchema ++ relation.partitionSchema` field metadata
and inject it into the task list before the proto is finalised.
The native-side column-mapping rewrite in planner.rs already
consumes that list.
Validated by the `decoded-objects` representative test
(DeltaCDCSQLIdColumnMappingSuite "batch write: append, dynamic
partition overwrite + CDF - column mapping id mode"), which now
returns the correct CDC records instead of (null, null) data columns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments