import: honor Spark legacy Parquet datetime metadata by D3Hunter · Pull Request #67908 · pingcap/tidb

D3Hunter · 2026-04-20T09:28:36Z

What problem does this PR solve?

Issue Number: close #67849

Problem Summary:

IMPORT INTO can read Spark-written(Aurora snapshot is also written this way) Parquet files whose footer marks ancient DATE/TIMESTAMP values as using Spark's legacy hybrid Julian/Gregorian calendar. The previous Parquet read path did not honor that Spark metadata, so ancient values could be imported on the wrong calendar axis, for example importing 0001-01-01 00:00:00 as 0000-12-30 00:00:00.

What changed and how does it work?

This PR teaches the Lightning/MyDump Parquet parser to detect Spark legacy datetime and INT96 footer metadata, including Spark version and timezone keys, and to apply Spark-compatible legacy Julian-to-Gregorian rebasing for DATE, TIMESTAMP_MILLIS, TIMESTAMP_MICROS, and INT96 values before converting them to TiDB datums.

It also adds Spark rebase switch-table data, parser/converter unit coverage for legacy and modern Spark Parquet metadata, and RealTiKV IMPORT INTO regression coverage using Spark legacy Parquet fixtures.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

benchmark for the rebase method for handling datetime

BenchmarkRebaseSparkJulianToGregorianMicros/Table-10          136634576   8.453 ns/op   0 B/op   0 allocs/op
BenchmarkRebaseSparkJulianToGregorianMicros/BeforeSwitch-10    51227137   29.22 ns/op   0 B/op   0 allocs/op

we dump the expect as parquet with legacy mode, and import with new/old binary. here is the result of date/datetime, now new equals to expect

group	rows compared	`new` vs expect	`old` vs expect	missing rows
date	33	matches exactly, `0` mismatches	does not match, `25` mismatches	none
datetime	117	matches exactly, `0` mismatches	does not match, `92` mismatches	none

below are the first 5 rows

Date

rn	tdate	tnewdate	tolddate
1	0001-01-01	0001-01-01 (MATCH)	0000-12-30 (DIFF)
2	0100-02-28	0100-02-28 (MATCH)	0100-02-26 (DIFF)
3	0100-03-01	0100-03-01 (MATCH)	0100-02-28 (DIFF)
4	0200-02-28	0200-02-28 (MATCH)	0200-02-27 (DIFF)
5	0200-03-01	0200-03-01 (MATCH)	0200-03-01 (MATCH)

Datetime

rn	tdatetime	tnewdatetime	tolddatetime
1	0001-01-01 00:00:00.000000	0001-01-01 00:00:00.000000 (MATCH)	0000-12-30 00:00:00.000000 (DIFF)
2	0001-01-01 00:00:00.000001	0001-01-01 00:00:00.000001 (MATCH)	0000-12-30 00:00:00.000001 (DIFF)
3	0001-06-15 12:34:56.789123	0001-06-15 12:34:56.789123 (MATCH)	0001-06-13 12:34:56.789123 (DIFF)
4	0001-12-31 23:59:59.999999	0001-12-31 23:59:59.999999 (MATCH)	0001-12-29 23:59:59.999999 (DIFF)
5	0050-01-01 00:00:00.000000	0050-01-01 00:00:00.000000 (MATCH)	0049-12-30 00:00:00.000000 (DIFF)

No need to test
- I checked and no code files have been changed.

Validated locally:

make bazel_prepare
git diff --check upstream/master...HEAD
./tools/check/failpoint-go-test.sh pkg/lightning/mydump -run 'TestParquetVariousTypes/(spark_legacy_datetime_rebase|spark_legacy_date_switches|legacy_timestamp_rebase_utc|legacy_timestamp_rebase_non_utc|spark_legacy_timestamp_rebase_uses_spark_zone_tables|spark_legacy_timestamp_default_zone_exists_in_table|spark_legacy_timestamp_rebase_uses_utc_when_zone_table_is_missing|spark_legacy_timestamp_before_table_range_uses_hybrid_calendar_fallback)'

Not run locally:

go test -run 'TestImportParquetWithSparkLegacy(Date|DateTimes)' -tags=intest,deadlock ./tests/realtikvtest/importintotest/...
make lint

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix IMPORT INTO from Spark legacy Parquet files to honor legacy datetime rebasing metadata.

Summary by CodeRabbit

New Features
- Added Spark-legacy Parquet rebasing with timezone-aware handling for accurate DATE/TIMESTAMP imports.
Bug Fixes
- Fixed INT96 and negative pre-epoch timestamp handling and improved rounding/precision for time values.
Tests
- Added extensive tests, embedded Spark legacy Parquet fixtures, and a benchmark covering rebasing and precision scenarios.
Chores
- Updated build/test configurations and widened decoding/runtime option support to improve parsing reliability and test parallelism.

ti-chi-bot · 2026-04-20T09:28:39Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-04-20T09:28:43Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Spark legacy datetime detection and rebasing to Parquet parsing/conversion, introduces a new rebase implementation, updates Parquet writer option handling and mydump build inputs, and adds unit and integration tests plus embedded Spark-legacy Parquet fixtures and minor Bazel test config tweaks.

Changes

Cohort / File(s)	Summary
Build / Test config `br/pkg/metautil/BUILD.bazel`, `pkg/importsdk/BUILD.bazel`, `pkg/lightning/mydump/BUILD.bazel`, `tests/realtikvtest/importintotest/BUILD.bazel`	Adjusted test sharding and timeouts, added `//pkg/parser/ast` to a test dep, included Arrow/parquet deps and generated sources in mydump build, and embedded Spark legacy Parquet fixtures for tests.
Parser: metadata & column setup `pkg/lightning/mydump/parquet_parser.go`	Capture file metadata once, ensure non-nil effective timezone, and populate per-column `sparkRebaseMicros` lookups from Parquet footer Spark metadata.
Spark rebase implementation `pkg/lightning/mydump/spark_rebase.go`	New implementation detecting Spark legacy flags/version/timezone from footer and computing Julian↔Gregorian rebase lookups and hybrid-calendar conversions for microseconds/days.
Type conversion & INT96 fixes `pkg/lightning/mydump/parquet_type_converter.go`	Switch to Arrow time constructors, apply conditional Spark-style rebasing for DATE/TIMESTAMP/INT96, fix INT96 negative-time handling, and propagate rebase errors.
Parquet writer options `pkg/lightning/mydump/parquet_writer.go`	`WriteParquetFile` variadic changed to `...any`; runtime-classify options into `parquet.WriterProperty` and `file.WriteOption`, reorder init, and error on unsupported option types.
Unit tests & benchmark `pkg/lightning/mydump/parquet_parser_test.go`	Large test additions: Int96 helper, many subtests for rounding and Spark legacy rebasing across versions/timezones, plus a benchmark for rebase lookup.
Integration tests & fixtures `tests/realtikvtest/importintotest/parquet_test.go`	Embed Spark legacy Parquet fixtures and add two integration tests verifying IMPORT INTO handles Spark legacy DATE and DATETIME imports; adds `testkit` usage.

Sequence Diagram

sequenceDiagram
    participant Reader as Parquet Reader
    participant Parser as Parser (parquet_parser.go)
    participant Converter as Type Converter (parquet_type_converter.go)
    participant Rebase as Spark Rebase (spark_rebase.go)

    Reader->>Parser: Read footer & file metadata
    Parser->>Parser: Extract org.apache.spark.* keys, version, timezone
    Parser->>Parser: Build per-column sparkRebaseMicros lookup or none
    Reader->>Converter: Provide raw column values + sparkRebaseMicros
    Converter->>Converter: Decode raw value (Arrow constructors)
    alt spark rebase lookup present
        Converter->>Rebase: rebaseSparkJulianToGregorianMicros / rebaseJulianToGregorianDays
        Rebase->>Rebase: Use version cutoff, timezone table, or hybrid conversion
        Rebase-->>Converter: Return rebased micros/days or error
        Converter->>Converter: Convert rebased value to Go time
    else no rebasing
        Converter->>Converter: Convert raw value to Go time
    end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

importsdk, importer, importinto: add import size estimate #67241 — Adds //pkg/parser/ast to importsdk_test deps (same test-deps change present here).

Suggested Labels

component/import, component/lightning, ok-to-test

Suggested Reviewers

joechenrh
OliverS929
GMHDBJD
Benjamin2037

Poem

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'import: honor Spark legacy Parquet datetime metadata' directly and clearly describes the main change—teaching the Parquet importer to respect Spark legacy datetime metadata—which aligns with all modified files and the PR's core objective.
Linked Issues check	✅ Passed	The PR comprehensively addresses issue `#67849` by implementing Spark legacy datetime metadata detection, Julian-to-Gregorian rebasing for DATE/TIMESTAMP/INT96 values, and adding unit and integration tests, with validation showing the fix correctly imports 0001-01-01 as expected instead of 0000-12-30.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to Spark legacy Parquet datetime rebasing: new spark_rebase.go implementation, parser/converter updates, test files, BUILD files, and one unrelated test timeout change in temporarytabletest. The timeout change is minor and included in the commit.
Description check	✅ Passed	The PR description is complete with all required sections filled: issue number linked, problem summary explaining the Spark legacy datetime metadata issue, detailed explanation of changes, comprehensive test checklist with unit tests and manual validation results, side effects assessed, documentation impact noted, and a release note provided.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tiprow · 2026-04-20T09:28:56Z

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pantheon-ai · 2026-04-20T09:32:27Z

@D3Hunter I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/lightning/mydump/parquet_writer.go (1)

166-207: ⚠️ Potential issue | 🟡 Minor

Validate options before opening the object writer.

The unsupported-option branch now returns after s.Create, leaving the object writer unclosed. Classify addOpts and build the schema/properties before creating the writer, or add cleanup for every early return.

Suggested direction

 func WriteParquetFile(path, fileName string, pcolumns []ParquetColumn, rows int, addOpts ...any) error {
-	s, err := getStore(path)
-	if err != nil {
-		return err
-	}
-	writer, err := s.Create(context.Background(), fileName, nil)
-	if err != nil {
-		return err
-	}
-	wrapper := &writeWrapper{Writer: writer}
+	var extraProps []parquet.WriterProperty
+	writerOpts := make([]file.WriteOption, 0, len(addOpts)+1)
+	for _, opt := range addOpts {
+		switch v := opt.(type) {
+		case parquet.WriterProperty:
+			extraProps = append(extraProps, v)
+		case file.WriteOption:
+			writerOpts = append(writerOpts, v)
+		default:
+			return fmt.Errorf("unsupported parquet writer option type %T", opt)
+		}
+	}
 
 	fields := make([]schema.Node, len(pcolumns))
 	opts := make([]parquet.WriterProperty, 0, len(pcolumns)*2)
@@
 	}
 
 	node, _ := schema.NewGroupNode("schema", parquet.Repetitions.Required, fields, -1)
-	var writerOpts []file.WriteOption
-	for _, opt := range addOpts {
-		switch v := opt.(type) {
-		case parquet.WriterProperty:
-			opts = append(opts, v)
-		case file.WriteOption:
-			writerOpts = append(writerOpts, v)
-		default:
-			return fmt.Errorf("unsupported parquet writer option type %T", opt)
-		}
-	}
+	opts = append(opts, extraProps...)
 	props := parquet.NewWriterProperties(opts...)
 	writerOpts = append(writerOpts, file.WithWriterProps(props))
+
+	s, err := getStore(path)
+	if err != nil {
+		return err
+	}
+	writer, err := s.Create(context.Background(), fileName, nil)
+	if err != nil {
+		return err
+	}
+	wrapper := &writeWrapper{Writer: writer}
 	pw := file.NewParquetWriter(wrapper, node, writerOpts...)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/lightning/mydump/parquet_writer.go` around lines 166 - 207, The
WriteParquetFile function currently opens the object writer via s.Create before
validating addOpts, so any early return (e.g., unsupported option in the switch
over addOpts) leaks the writer; move the addOpts classification and
schema/property construction (the loop building fields and opts and the switch
over addOpts) to occur before calling s.Create (getStore and s.Create should be
invoked only after options are validated and fields/opts prepared), or if you
prefer to keep s.Create where it is, ensure every early return closes writer
(wrapper.Close/ writer.Close) and handles errors; update references in this
function (WriteParquetFile, getStore, s.Create, addOpts, fields, opts,
writer/wrapper) accordingly so no path returns with the object writer left open.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/lightning/mydump/parquet_type_converter.go`:
- Around line 405-410: The TIMESTAMP_MILLIS rebasing can overflow when computing
val*1000 before passing to rebaseSparkJulianToGregorianMicros; modify the block
that checks converted.sparkRebaseTimeZoneID to first validate that val is within
safe bounds (e.g. ensure val <= math.MaxInt64/1000 and val >= math.MinInt64/1000
or compare against the known millis cutoff) and return an error if it would
overflow, only then multiply by 1000 and call
rebaseSparkJulianToGregorianMicros(converted.sparkRebaseTimeZoneID, val*1000);
ensure the function handling (and error return) remains unchanged for safe
values.

---

Outside diff comments:
In `@pkg/lightning/mydump/parquet_writer.go`:
- Around line 166-207: The WriteParquetFile function currently opens the object
writer via s.Create before validating addOpts, so any early return (e.g.,
unsupported option in the switch over addOpts) leaks the writer; move the
addOpts classification and schema/property construction (the loop building
fields and opts and the switch over addOpts) to occur before calling s.Create
(getStore and s.Create should be invoked only after options are validated and
fields/opts prepared), or if you prefer to keep s.Create where it is, ensure
every early return closes writer (wrapper.Close/ writer.Close) and handles
errors; update references in this function (WriteParquetFile, getStore,
s.Create, addOpts, fields, opts, writer/wrapper) accordingly so no path returns
with the object writer left open.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7748fdc9-e2bd-4b73-927e-aa174ae131ad

📥 Commits

Reviewing files that changed from the base of the PR and between b6200ce and 5f08ad1.

⛔ Files ignored due to path filters (2)

tests/realtikvtest/importintotest/spark-legacy-date.gz.parquet is excluded by !**/*.parquet
tests/realtikvtest/importintotest/spark-legacy-datetime.gz.parquet is excluded by !**/*.parquet

📒 Files selected for processing (10)

br/pkg/metautil/BUILD.bazel
pkg/importsdk/BUILD.bazel
pkg/lightning/mydump/BUILD.bazel
pkg/lightning/mydump/parquet_parser.go
pkg/lightning/mydump/parquet_parser_test.go
pkg/lightning/mydump/parquet_type_converter.go
pkg/lightning/mydump/parquet_writer.go
pkg/lightning/mydump/spark_rebase_micros_generated.go
tests/realtikvtest/importintotest/BUILD.bazel
tests/realtikvtest/importintotest/parquet_test.go

codecov · 2026-04-20T09:56:03Z

Codecov Report

❌ Patch coverage is 81.56863% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.4259%. Comparing base (d0712ac) to head (dd2a4b8).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67908        +/-   ##
================================================
+ Coverage   77.5894%   79.4259%   +1.8364%     
================================================
  Files          1982       1995        +13     
  Lines        548964     551480      +2516     
================================================
+ Hits         425938     438018     +12080     
+ Misses       122221     111991     -10230     
- Partials        805       1471       +666

Flag	Coverage Δ
integration	`46.7977% <21.5139%> (+12.4577%)`	⬆️
unit	`76.6633% <81.5686%> (+0.3364%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`61.5065% <ø> (+0.0901%)`	⬆️
parser	`∅ <ø> (∅)`
br	`66.0241% <ø> (+5.5069%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

D3Hunter · 2026-04-20T10:58:45Z

/retest

tiprow · 2026-04-20T10:59:09Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-04-20T13:46:22Z

/retest

tiprow · 2026-04-20T13:46:45Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-04-20T16:05:15Z

/retest

tiprow · 2026-04-20T16:05:44Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-04-20T16:35:19Z

/retest

tiprow · 2026-04-20T16:35:47Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-04-21T02:17:04Z

/retest

tiprow · 2026-04-21T02:17:29Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ingress-bot · 2026-04-21T02:41:47Z

🔍 Starting code review for this PR...

ingress-bot

Summary

Total findings: 11
Inline comments: 11
Summary-only findings (no inline anchor): 0

Findings (highest risk first)

🚨 [Blocker] (1)

Spark legacy rebase is enabled without mixed-version rollout guard (pkg/lightning/mydump/parquet_parser.go:768, pkg/lightning/mydump/parquet_type_converter.go:405, pkg/dxf/importinto/job.go:62, pkg/dxf/importinto/proto.go:46)

⚠️ [Major] (5)

Unknown Spark timezone is silently coerced to UTC instead of surfacing incompatibility (pkg/lightning/mydump/parquet_parser.go:337, pkg/lightning/mydump/parquet_parser_test.go:595)
Legacy Spark timestamp rebasing falls back to UTC before parser location defaults are applied (pkg/lightning/mydump/parquet_parser.go:768, pkg/lightning/mydump/parquet_parser.go:455, pkg/lightning/mydump/loader.go:630)
No regression test pins the Spark 3.0.x INT96-vs-datetime cutoff split (pkg/lightning/mydump/parquet_parser.go:94, pkg/lightning/mydump/parquet_parser.go:95, pkg/lightning/mydump/parquet_parser_test.go:321)
WriteParquetFile no longer communicates its accepted option contract (pkg/lightning/mydump/parquet_writer.go:166)
WriteParquetFile switched to untyped varargs and dropped compile-time option contracts (pkg/lightning/mydump/parquet_writer.go:166)

🟡 [Minor] (5)

Spark rebase policy is now split across generated and handwritten tables (pkg/lightning/mydump/parquet_type_converter.go:43, pkg/lightning/mydump/spark_rebase_micros_generated.go:15)
Legacy timestamp rebasing repeats timezone index lookup on every value (pkg/lightning/mydump/parquet_type_converter.go:301, pkg/lightning/mydump/parquet_type_converter.go:405, pkg/lightning/mydump/parquet_type_converter.go:483)
INT96 conversion now truncates sub-microsecond precision instead of preserving canonical rounding (pkg/lightning/mydump/parquet_type_converter.go:501, pkg/types/time.go:183)
Exported WriteParquetFile variadic type change breaks typed-slice callers (pkg/lightning/mydump/parquet_writer.go:166)
Unsupported writer-option path returns without closing the created object writer (pkg/lightning/mydump/parquet_writer.go:166, pkg/lightning/mydump/parquet_writer.go:206)

coderabbitai

♻️ Duplicate comments (1)

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/lightning/mydump/parquet_type_converter.go`:
- Around line 525-528: The int96ToUnixMicros function currently truncates
sub-microsecond precision by doing nanosOfDay/int64(time.Microsecond); instead,
compute and return the timestamp in nanoseconds (totalNanoseconds :=
(julianDay-julianDayOfUnixEpoch)*int64(24*time.Hour) + nanosOfDay) so callers
can convert to time.Time / use types.FromGoTime and let TiDB's
nearest-microsecond rounding happen there; update all call sites of
int96ToUnixMicros to accept/handle nanoseconds (or rename to int96ToUnixNanos)
and only divide by int64(time.Microsecond) at the final conversion step where
types.FromGoTime is used.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 428367ed-3f64-4807-9013-1c7aef5b4e77

📥 Commits

Reviewing files that changed from the base of the PR and between 77d044d and 29fb239.

📒 Files selected for processing (4)

pkg/lightning/mydump/parquet_parser.go
pkg/lightning/mydump/parquet_parser_test.go
pkg/lightning/mydump/parquet_type_converter.go
pkg/lightning/mydump/parquet_writer.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/lightning/mydump/parquet_parser_test.go

D3Hunter · 2026-04-21T06:31:23Z

/retest

tiprow · 2026-04-21T06:31:48Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

joechenrh · 2026-04-21T07:02:32Z

+					return nil, errors.Trace(err)
+				}
+			}
+		case parquet.Types.Int96:


It's a bad choice to store legacy converted type instead of using logical type directly. I don't remember why I wrote this, may just following the old logic 😢. Perhaps we can refactor it later.

as logical type might be invalid ?

tidb/pkg/lightning/mydump/parquet_parser.go

Lines 637 to 638 in d705928

logicalType := desc.LogicalType()

if logicalType.IsValid() {

Yes, logical type is not mandatory when writing files. But maybe we can convert "converted type" to "logical type".

ti-chi-bot · 2026-04-21T07:02:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: joechenrh
Once this PR has been reviewed and has the lgtm label, please assign benjamin2037, yujuncen for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [joechenrh]
br/OWNERS
pkg/lightning/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-04-21T07:02:52Z

[LGTM Timeline notifier]

Timeline:

2026-04-21 07:02:51.780202396 +0000 UTC m=+2062976.985562454: ☑️ agreed by joechenrh.

D3Hunter · 2026-04-21T07:28:21Z

/retest

tiprow · 2026-04-21T07:28:44Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-04-21T08:11:07Z

/retest

tiprow · 2026-04-21T08:11:32Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-04-21T09:54:56Z

/retest

tiprow · 2026-04-21T09:55:19Z

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

D3Hunter · 2026-04-21T10:25:18Z

/cherry-pick release-nextgen-20251011

ti-chi-bot · 2026-04-21T10:25:20Z

@D3Hunter: once the present PR merges, I will cherry-pick it on top of release-nextgen-20251011 in the new PR and assign it to you.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

D3Hunter · 2026-04-21T10:25:26Z

/cherry-pick release-nextgen-202603

ti-chi-bot · 2026-04-21T10:25:28Z

@D3Hunter: once the present PR merges, I will cherry-pick it on top of release-nextgen-202603 in the new PR and assign it to you.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

D3Hunter added 11 commits April 17, 2026 12:45

change

e8785fe

change

6da679a

change

4c8fbfd

change

4b9bd43

change

f537da0

date

69e8056

date case

d99df02

change

174c0a2

change

825a9b5

test

4a85305

change

5f08ad1

ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 20, 2026

ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 20, 2026

D3Hunter changed the title ~~lightning: honor Spark legacy Parquet datetime metadata~~ import: honor Spark legacy Parquet datetime metadata Apr 20, 2026

D3Hunter marked this pull request as ready for review April 20, 2026 09:32

ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2026

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Comment thread pkg/lightning/mydump/parquet_parser.go Outdated

Comment thread pkg/lightning/mydump/parquet_type_converter.go Outdated

lightning: use floor division for int96 test helper

77d044d

ingress-bot reviewed Apr 21, 2026

View reviewed changes

lightning: address parquet rebase review comments

29fb239

coderabbitai bot reviewed Apr 21, 2026

View reviewed changes

D3Hunter added 3 commits April 21, 2026 12:43

pkg/lightning/mydump: move Spark rebase helpers

d705928

lightning: reorder spark rebase helpers

5ebe3a6

pkg/lightning/mydump: preserve INT96 rounding

a4b2fa4

joechenrh approved these changes Apr 21, 2026

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 21, 2026

session: increase temporary table test timeout

dd2a4b8

Conversation

D3Hunter commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot bot commented Apr 20, 2026

Uh oh!

coderabbitai bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly related PRs

Suggested Labels

Suggested Reviewers

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

tiprow bot commented Apr 20, 2026

Uh oh!

pantheon-ai bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

D3Hunter commented Apr 20, 2026

Uh oh!

tiprow bot commented Apr 20, 2026

Uh oh!

D3Hunter commented Apr 20, 2026

Uh oh!

tiprow bot commented Apr 20, 2026

Uh oh!

D3Hunter commented Apr 20, 2026

Uh oh!

tiprow bot commented Apr 20, 2026

Uh oh!

D3Hunter commented Apr 20, 2026

Uh oh!

tiprow bot commented Apr 20, 2026

Uh oh!

D3Hunter commented Apr 21, 2026

Uh oh!

tiprow bot commented Apr 21, 2026

Uh oh!

ingress-bot commented Apr 21, 2026

Uh oh!

ingress-bot left a comment

Choose a reason for hiding this comment

Summary

🚨 [Blocker] (1)

⚠️ [Major] (5)

🟡 [Minor] (5)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

D3Hunter commented Apr 20, 2026 •

edited

Loading

coderabbitai bot commented Apr 20, 2026 •

edited

Loading

pantheon-ai bot commented Apr 20, 2026 •

edited

Loading

codecov bot commented Apr 20, 2026 •

edited

Loading