chore: Remove config option for native_iceberg_compat#4019
chore: Remove config option for native_iceberg_compat#4019andygrove wants to merge 12 commits intoapache:mainfrom
native_iceberg_compat#4019Conversation
|
Thanks @andygrove 👀 |
mbutrovich
left a comment
There was a problem hiding this comment.
Thanks @andygrove, let's get that CI matrix and tech debt down.
comphead
left a comment
There was a problem hiding this comment.
it is LGTM, I was thinking if we need it for some debug/benchmark usecases but nothing comes to my head. So let it go
|
Is there still a functionality gap between |
So #3720 and #3442. We should be able resolve #3442 now that #4011 is merged. |
#3443 is fixed in #4038, so that just leaves #3720 which is just that Comet is more lenient with type widening and/or throws different exception than Spark. Not a correctness issue. |
|
@parthchandra ok if we merge this one? |
|
My general recommendation would be that we enable ignored tests before dropping Here's Claude's summary of ignored tests - 1.
|
| Test Name | Diffs |
|---|---|
join key with multiple references on the filtering plan |
4.0.1 |
SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery |
4.0.1 |
alter temporary view should follow current storeAnalyzedPlanForView config |
4.0.1 |
AdaptiveQueryExecSuite (#3442)
| Test Name | Diffs |
|---|---|
static scan metrics |
3.4.3, 3.5.8, 4.0.1 |
FileBasedDataSourceSuite (#3321)
| Test Name | Diffs |
|---|---|
Enabling/disabling ignoreMissingFiles using parquet (conditionally tagged only when format == "parquet") |
4.0.1 |
Enabling/disabling ignoreCorruptFiles |
4.0.1 |
ParquetFilterSuite (3.4.3 only)
| Test Name | Diffs |
|---|---|
filter pushdown - StringPredicate (tagged IgnoreCometNativeDataFusion in 3.4.3; IgnoreCometNativeScan in 3.5.8/4.0.1) |
3.4.3 |
ParquetSchemaSuite (#3720)
| Test Name | Diffs |
|---|---|
SPARK-35640: read binary as timestamp should throw schema incompatible error |
3.4.3, 3.5.8, 4.0.1 |
SPARK-35640: int as long should throw schema incompatible error |
3.4.3, 3.5.8 |
SPARK-47447: read TimestampLTZ as TimestampNTZ |
4.0.1 |
SPARK-36182: can't read TimestampLTZ as TimestampNTZ |
3.4.3, 3.5.8 |
SPARK-34212 Parquet should read decimals correctly |
3.4.3, 3.5.8, 4.0.1 |
row group skipping doesn't overflow when reading into larger type |
3.4.3, 3.5.8, 4.0.1 |
ParquetSchemaEvolutionSuite (#3720)
| Test Name | Diffs |
|---|---|
schema mismatch failure error message for parquet vectorized reader |
3.4.3, 3.5.8, 4.0.1 |
SPARK-45604: schema mismatch failure error on timestamp_ntz to array<timestamp_ntz> |
3.4.3, 3.5.8, 4.0.1 |
ParquetTypeWideningSuite (#3321)
| Test Name | Diffs |
|---|---|
parquet widening conversion DateType -> TimestampNTZType (conditionally tagged) |
4.0.1 |
unsupported parquet conversion $fromType -> $toType (multiple type combos) |
4.0.1 |
unsupported parquet timestamp conversion $fromType ($outputTimestampType) -> $toType |
4.0.1 |
parquet decimal precision change Decimal($fromPrecision, 2) -> Decimal($toPrecision, 2) |
4.0.1 |
parquet decimal precision and scale change Decimal($fromPrecision, $fromScale) -> Decimal($toPrecision, $toScale) |
4.0.1 |
2. assume() — runtime skip
ParquetRowIndexSuite (#3886) — 4.0.1 only
| Test Name | Condition |
|---|---|
invalid row index column type - ${conf.desc} |
Skipped when COMET_NATIVE_SCAN_IMPL is SCAN_NATIVE_DATAFUSION or SCAN_AUTO. Comet throws RuntimeException instead of SparkException. |
CometExpressionSuite — Comet's own test suite
| Test Name | Condition |
|---|---|
get_struct_field - select primitive fields |
Skipped when scanImpl == SCAN_AUTO && Spark 4.0+ |
get_struct_field - select subset of struct |
Skipped when scanImpl == SCAN_AUTO && Spark 4.0+ |
get_struct_field - read entire struct |
Skipped when scanImpl == SCAN_AUTO && Spark 4.0+ |
Summary by Tracking Issue
| Issue | Count | Description |
|---|---|---|
| #3321 | ~12 | Schema evolution, corrupt/missing files, AQE, type widening |
| #3720 | ~8 | Schema mismatch errors, decimal reads, row group skipping |
| #3442 | 1 | Static scan metrics with DPP |
| #3886 | 1 | Row index column type error type mismatch |
| (no issue) | 5 | Filter pushdown / accumulator tests (IgnoreCometNativeScan) |
| (no issue) | 3 | get_struct_field tests (auto + Spark 4.0+ only) |
|
More Claude analysis on schema mismatch. Claude recommends we explicitly check that the following tests fail with a different message instead of actually succeeding (because the results will be wrong) -
|
|
One final word from Claude -
|
Alright, I have moved this to draft for now. I'll take another more detailed look at #3720 soon. Thanks @parthchandra. |
Which issue does this PR close?
Partially address #4020.
Rationale for this change
Removing support for
native_iceberg_compatreduces the maintenance burden of the project.This PR makes it impossible for users to select
native_iceberg_compatand stops running tests for that scan impl.Subsequent PRs will remove the implementation code.
What changes are included in this PR?
COMET_NATIVE_SCAN_IMPLconfig now only allowsautoornative_datafusionCometScanRuleno longer uses the value of config optionCOMET_NATIVE_SCAN_IMPLand just usesnative_datafusionscannative_iceberg_compatnative_iceberg_compathave been removedHow are these changes tested?