[SPARK-55441][SQL] Types Framework - Phase 1c - Client Integration by davidm-db · Pull Request #54905 · apache/spark

davidm-db · 2026-03-19T14:06:33Z

What changes were proposed in this pull request?

Extends the Types Framework (Phase 1a: #54223) with optional traits for client-facing infrastructure dispatch. Adds 4 new traits and integrates 12 files so that new framework-managed types get client support automatically — zero edits to infrastructure files.

New traits:

CatalystTypeOps (catalyst) — serializer/deserializer expression building, Arrow field writers
ClientTypeOps (sql/api) — Arrow conversion, Python interop, Hive formatting, Thrift type ID
ProtoTypeOps (connect/common) — Spark Connect proto DataType and Literal conversions
ConnectArrowTypeOps (connect/common) — Connect Arrow serde (no feature flag — client must handle whatever the server sends)

TimeType serves as the reference implementation. One new per-type class (TimeTypeConnectOps) handles both Connect traits; the existing TimeTypeApiOps and TimeTypeOps gain mix-ins for ClientTypeOps and CatalystTypeOps respectively.

Integration pattern: Every integration point uses XxxTypeOps(dt).map(_.method).getOrElse(methodDefault(...)) — framework dispatch first, original code extracted to a private *Default helper as fallback. This minimizes diff on existing code (match bodies stay at their original indentation in the helper).

Trait method design: Methods that may need extra context for future complex types use overloading — a simple abstract version (most types override this) and a concrete extended version with additional parameters that delegates to the simple one by default. For example, formatExternal(value) for most types, with formatExternal(value, nested) available for types that need nesting-aware formatting.

Visibility changes:

ArrowFieldWriter and TimeWriter widened from private[arrow] to private[sql] so CatalystTypeOps can declare/instantiate them.
ArrowSerializer.FieldSerializer, ArrowVectorReader, and TimeVectorReader widened from private[arrow] to private[connect] so ConnectArrowTypeOps can use proper return types instead of Any.

Why are the changes needed?

Without this, adding a new data type requires manually editing 13+ client integration files with diverse patterns. With these traits, a new type implements the relevant ops methods and all client infrastructure dispatches through the framework automatically.

JDBC Data Source is intentionally excluded from this PR. JDBC type mapping (JdbcUtils) requires types from sql/core that ClientTypeOps (in sql/api) can't reference, and partial support (DDL without read/write) would be worse than no support. A separate JdbcTypeOps trait in sql/core is planned to cover all four JDBC methods (getCommonJDBCType, getCatalystType, makeGetter, makeSetter) as a complete unit.

Does this PR introduce any user-facing change?

No. The framework is gated behind spark.sql.types.framework.enabled (default false in production, true in tests). Existing behavior is unchanged — all original TimeType branches remain as fallback in the *Default methods.

How was this patch tested?

The feature flag is enabled by default in test environments (Utils.isTesting), so the entire existing test suite validates the framework code path. No new tests are added in this PR because the framework delegates to the same underlying logic that existing tests already cover.

In subsequent phases, the testing focus will be on:

Testing the framework itself (Ops interface contracts, roundtrip correctness, edge cases)
Designing a generalized testing mechanism that enforces proper test coverage for each type added through the framework

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6 (claude-opus-4-6)

sql/api/src/main/scala/org/apache/spark/sql/types/ops/ClientTypeOps.scala

sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimeTypeApiOps.scala

sql/api/src/main/scala/org/apache/spark/sql/types/ops/ClientTypeOps.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/ops/CatalystTypeOps.scala

sql/api/src/main/scala/org/apache/spark/sql/types/ops/ClientTypeOps.scala

.../src/main/scala/org/apache/spark/sql/connect/client/arrow/types/ops/TimeTypeConnectOps.scala

...nect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala

...t/common/src/main/scala/org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala

sql/api/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala

...nect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala

...onnect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowSerializer.scala

...nnect/common/src/main/scala/org/apache/spark/sql/connect/common/DataTypeProtoConverter.scala

...t/common/src/main/scala/org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala

davidm-db · 2026-03-20T21:50:42Z

...t/common/src/main/scala/org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala

-              Map("typeInfo" -> s"${literalCase.name}(${literalCase.getNumber})"))
-        }
-        builder.build()
+        ProtoTypeOps


there doesn't seem to be a way to nicely/cleanly reduce the diff here

cloud-fan

Review

Prior state and problem: Adding a new data type required manually editing 13+ client-side infrastructure files with diverse dispatch patterns.

Design approach: Phase 1c extends the Types Framework with 4 new optional traits (ClientTypeOps, CatalystTypeOps, ProtoTypeOps, ConnectArrowTypeOps) that centralize client-facing operations per type. Each integration file wraps its original method body in a *Default helper and adds a framework dispatch. TimeType serves as the reference implementation.

Key design decisions:

Feature flag gating: server-side ops check typesFrameworkEnabled; Connect client Arrow ops skip the flag (client must handle whatever server sends).
Visibility strategy: CatalystTypeOps uses type-safe ArrowFieldWriter (widened to private[sql]), while ConnectArrowTypeOps uses Any for all parameters/returns due to private[arrow] visibility of concrete types.
Reverse lookup registration: forward dispatch goes through a single ops instance, but reverse lookups (Arrow→DataType, proto→DataType, value→proto) each need separate factory methods with per-type matching.

Implementation sketch: 13 integration points consistently use XxxTypeOps(dt).map(_.method).getOrElse(default(...)). Per-type ops classes: TimeTypeApiOps (sql/api), TimeTypeOps (catalyst), TimeTypeConnectOps (connect).

cloud-fan · 2026-03-31T13:58:27Z

...nnect/common/src/main/scala/org/apache/spark/sql/connect/common/types/ops/ProtoTypeOps.scala

+ */
+object ProtoTypeOps {
+
+  def apply(dt: DataType): Option[ProtoTypeOps] = {


Adding a second framework-managed type requires 5 separate match-clause additions in this file (apply, toLiteralProtoForValue, toCatalystType, getScalaConverterForKind, getProtoDataTypeFromLiteral). Three of those dispatch by proto enum cases and can be consolidated.

Concrete suggestion — extract a shared opsForKindCase lookup and add a toCatalystTypeFromProto method to the trait:

// Add to ProtoTypeOps trait: def toCatalystTypeFromProto(t: proto.DataType): DataType // In companion object — single KindCase → ops mapping: private def opsForKindCase( kindCase: proto.DataType.KindCase): Option[ProtoTypeOps] = kindCase match { case proto.DataType.KindCase.TIME => Some(new TimeTypeConnectOps(TimeType())) // Add new framework types here — single registration for all KindCase lookups case _ => None } // Then toCatalystType, getScalaConverterForKind, getProtoDataTypeFromLiteral all delegate: def toCatalystType(t: proto.DataType): Option[DataType] = { if (!SqlApiConf.get.typesFrameworkEnabled) return None opsForKindCase(t.getKindCase).map(_.toCatalystTypeFromProto(t)) } def getScalaConverterForKind( kindCase: proto.DataType.KindCase): Option[proto.Expression.Literal => Any] = { if (!SqlApiConf.get.typesFrameworkEnabled) return None opsForKindCase(kindCase).map(_.getScalaConverter) } def getProtoDataTypeFromLiteral(literal: proto.Expression.Literal): Option[proto.DataType] = { if (!SqlApiConf.get.typesFrameworkEnabled) return None // LiteralTypeCase maps 1:1 to KindCase for framework types opsForKindCase(literalCaseToKindCase(literal.getLiteralTypeCase)) .map(_.getProtoDataTypeFromLiteral(literal)) }

This reduces per-type registration in this file from 5 match points to 3 (apply, opsForKindCase, toLiteralProtoForValue).

Applied your suggestion. Extracted a shared opsForKindCase lookup and added toCatalystTypeFromProto to the trait. toCatalystType, getScalaConverterForKind, and getProtoDataTypeFromLiteral all delegate through the single opsForKindCase registration point now. Per-type registration in this file reduced from 5 match blocks to 3 (apply, opsForKindCase, toLiteralProtoForValue). Added literalCaseToKindCase helper for the LiteralTypeCase-to-KindCase mapping.

cloud-fan · 2026-03-31T13:58:27Z

...ommon/src/main/scala/org/apache/spark/sql/connect/common/types/ops/ConnectArrowTypeOps.scala

+  // the arrow package cast to the expected types.
+
+  /** Creates an Arrow serializer for writing values to a vector. Returns a Serializer. */
+  def createArrowSerializer(vector: Any): Any


All three methods use Any for both parameters and return types — type mismatches between the ops implementation and the call site (e.g., casting to the wrong vector type) are only caught at runtime as ClassCastException. CatalystTypeOps avoided this by widening ArrowFieldWriter/TimeWriter to private[sql]. Could the Connect Arrow types (Serializer, Deserializer, ArrowVectorReader) similarly be widened to private[connect]?

Widened FieldSerializer to private[connect], ArrowVectorReader and TimeVectorReader to private[connect]. All three return types are now proper: ArrowSerializer.Serializer, ArrowDeserializers.Deserializer[Any], ArrowVectorReader. Call sites no longer need .asInstanceOf casts on the return value.

For parameters, 1 of 3 is also proper (createArrowVectorReader takes FieldVector). The other 2 (createArrowSerializer, createArrowDeserializer) take AnyRef because the existing caller APIs (serializerFor(encoder, v: AnyRef), deserializerFor(encoder, data: AnyRef)) use AnyRef — these signatures predate our work and use pattern matching to determine the concrete vector type. Using AnyRef avoids an unnecessary intermediate cast.

Net result: 6 Any positions reduced to 2 AnyRef parameters. The remaining 2 are constrained by the existing caller APIs, not by visibility.

cloud-fan · 2026-03-31T13:58:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala

      binaryFormatter: BinaryFormatter): String = a match {
    case (null, _) => if (nested) "null" else "NULL"
+    case (value, dt) =>
+      ClientTypeOps(dt).map(_.formatExternal(value, nested)).getOrElse {


Two concerns with this dispatch:

Bypasses session formatters. formatExternal uses the ops instance's own FractionTimeFormatter, ignoring the formatters: TimeFormatters parameter. Currently both use the same new FractionTimeFormatter() so output matches, but if session-level time formatting becomes configurable the framework path would silently ignore it.

Per-value allocation. ClientTypeOps(dt) → TypeApiOps(dt) creates a new TimeTypeApiOps (including a new @transient lazy val timeFormatter) on every call. For a result set with N TimeType values, that's N instances + N formatters vs. the original code which reuses a single formatters.time for all values.

Consider passing the session formatter through the ops interface, or caching the ops instance per DataType.

TimeType currently has two parallel implementations — the original hardcoded branches and the framework ops. The framework implementation is how a new type would be built from scratch (formatter owned by the ops class, no dependency on external formatter pipelines). This should be the correct design for new types going forward.

The concern about bypassing session formatters doesn't apply here — FractionTimeFormatter is fully stateless (no session configuration, no timezone, no legacy parser policy). Unlike TimestampFormatter (which takes sessionLocalTimeZone) or DateFormatter (which checks legacyTimeParserPolicy), the time formatter has zero session state. Both paths — the original formatters.time.format(lt) and the framework's formatExternal — create the same stateless FractionTimeFormatter and produce identical output.

For future types that DO need session-configurable formatting, the ops can read session state inside formatExternal at call time via SqlApiConf.get, matching how HiveResult.getTimeFormatters reads SQLConf.get.sessionLocalTimeZone. The framework design handles this without API changes.

Re allocation: tracked for Phase 1d — singleton caching of ops instances across all factory companions uniform. This was already noted down during work on other phases and I'll fix it all together in the upcoming phase.

cloud-fan · 2026-03-31T13:58:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

-  def getCommonJDBCType(dt: DataType): Option[JdbcType] = {
+  // Uses .orElse (not .getOrElse) because this method returns Option[JdbcType].
+  def getCommonJDBCType(dt: DataType): Option[JdbcType] =
+    ClientTypeOps(dt).map(ops => JdbcType(ops.jdbcTypeName, ops.getJdbcType))


When the framework is enabled, getCommonJDBCType(TimeType) now returns Some(JdbcType("TIME", java.sql.Types.TIME)). Before this PR, the default path had no TimeType case and returned None. This is a new capability (not a regression), but the PR description's "Does this PR introduce any user-facing change?" says No. Since the flag defaults to true in tests, this changes test-environment behavior.

Removed JDBC entirely from this PR. getCommonJDBCType reverted to original code (no framework dispatch), JDBC methods removed from ClientTypeOps. The framework should be a transparent refactoring with no behavior change. JDBC requires types from sql/core that ClientTypeOps (in sql/api) can't reference, and partial support (DDL without getter/setter) is worse than no support. A separate JdbcTypeOps trait in sql/core is planned for when full JDBC support is ready.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/ops/CatalystTypeOps.scala

dejankrak-db · 2026-04-01T11:45:22Z

Surface Areas Detected

Types Framework — 4 new traits (ClientTypeOps, CatalystTypeOps, ProtoTypeOps, ConnectArrowTypeOps) + 1 implementation (TimeTypeConnectOps)
Spark Connect — Proto converters, Arrow serde (7 files)
sql/core — HiveResult, JdbcUtils, EvaluatePython, SparkExecuteStatementOperation (4 files)
sql/catalyst — SerializerBuildHelper, DeserializerBuildHelper, ArrowWriter (3 files)
sql/api — ArrowUtils, TimeTypeApiOps (2 files)
DBR Analyzer / Parser / Reyden (0 files)

Findings

Finding 1: Arrow NANOSECOND unit maps to MICROS_PRECISION — semantic mismatch

Severity: High
Category: Behavioral Correctness
File: ClientTypeOps.scala:196 (fromArrowType)
Issue: fromArrowType matches ArrowType.Time with TimeUnit.NANOSECOND and 64-bit width but returns TimeType(TimeType.MICROS_PRECISION). Meanwhile, toArrowType in TimeTypeApiOps always emits ArrowType.Time(TimeUnit.NANOSECOND, 64) regardless of the
TimeType's actual precision. A roundtrip TimeType(NANOS_PRECISION) -> Arrow -> TimeType silently loses precision metadata.
Impact: Precision metadata is silently discarded on the Arrow deserialization path. If a TimeType with non-default precision is sent through Arrow (e.g., via PySpark or Connect), it will come back as MICROS_PRECISION.
Suggested fix: Either encode precision in Arrow field metadata, derive precision from the Arrow time unit, or document as a known lossy conversion with a TODO. toArrowType should arguably vary its TimeUnit based on t.precision.

Finding 2: Per-row object allocation in HiveResult hot path

Severity: High
Category: Performance
File: HiveResult.scala:113-123
Issue: ClientTypeOps(dt) is called on every (value, dt) pair in the result set, creating a new TimeTypeApiOps instance per call. Additionally, it's called for ALL non-null values of ALL types (not just TimeType), returning None for standard types —
adding overhead to every row of every query.
Impact: Significant performance regression on large result sets, especially for non-TimeType columns.
Suggested fix: Pre-compute the formatting function per column at the schema level (once per query), not per row. At minimum, restructure the match so the catch-all only triggers after all built-in types fail.

Finding 3: toLiteralProto hardcodes DEFAULT_PRECISION instead of using instance precision

Severity: High
Category: Behavioral Correctness
File: TimeTypeConnectOps.scala:763
Issue: toLiteralProto (the "generic, no DataType context" version) sets .setPrecision(TimeType.DEFAULT_PRECISION) instead of t.precision. The instance is constructed with a specific TimeType(precision), but this method ignores it. Meanwhile,
toLiteralProtoWithType correctly uses dt.asInstanceOf[TimeType].precision.
Impact: Literal precision metadata is lost when going through the "untyped" literal conversion path.
Suggested fix: Use t.precision instead of TimeType.DEFAULT_PRECISION.

Finding 4: ConnectArrowTypeOps uses Any throughout — type safety completely lost

Severity: Medium
Category: API and Interface Design
File: ConnectArrowTypeOps.scala:50-56
Issue: All three methods (createArrowSerializer, createArrowDeserializer, createArrowVectorReader) accept Any and return Any. Implementation does unchecked asInstanceOf casts — a wrong vector type produces a ClassCastException at runtime.
Impact: No compile-time protection for implementers or callers.
Suggested fix: Widen ArrowSerializer.Serializer, ArrowDeserializers.Deserializer, and ArrowVectorReader from private[arrow] to private[connect] to enable proper type signatures.

Finding 5: Null safety gap in HiveResult framework dispatch

Severity: Medium
Category: Behavioral Correctness
File: HiveResult.scala:113-129
Issue: The outer match handles (null, _) first, then dispatches to the framework for all other values. If ClientTypeOps(dt) matches, formatExternal calls value.asInstanceOf[LocalTime]. The null guard is solely in the outer match — if the dispatch
order is ever refactored, a NPE will occur.
Impact: Fragile null safety that depends on match ordering.
Suggested fix: Add a null guard in the ClientTypeOps(dt).map(...) call or in formatExternal.

Finding 6: Redundant TimeType branches in *Default methods — no verification of equivalence

Severity: Medium
Category: Rollout Safety
Files: EvaluatePython.scala (_: TimeType in needConversionInPythonDefault), LiteralValueProtoConverter.scala (TIME case in getProtoDataTypeDefault)
Issue: When framework is ON, TimeType hits the framework path and default branches are dead code. When OFF, default branches handle it. This is intentional for rollout safety, but there is no test verifying both paths produce identical results.
Impact: Silent behavioral divergence between framework-ON and framework-OFF would be undetectable until production rollout.
Suggested fix: Add a parameterized test that toggles spark.sql.types.framework.enabled and asserts identical results.

Finding 7: SerializerBuildHelper / DeserializerBuildHelper dispatch by enc.dataType ignoring encoder type

Severity: Medium
Category: Behavioral Correctness
Files: SerializerBuildHelper.scala:361, DeserializerBuildHelper.scala:336
Issue: Framework dispatch uses CatalystTypeOps(enc.dataType) before checking the encoder. The original code dispatches on the encoder — the same data type could have different encoders (e.g., AgnosticExpressionPathEncoder is handled first in
createSerializerDefault). The framework could intercept an encoder that should have been handled differently.
Impact: If a TimeType encoder appears as an AgnosticExpressionPathEncoder in the future, the framework path would incorrectly override it.
Suggested fix: Move framework dispatch after the AgnosticExpressionPathEncoder and isNativeEncoder checks, or have the framework dispatch also check encoder type.

Finding 8: Undocumented JDBC behavior change for TimeType

Severity: Medium
Category: Rollout Safety
File: JdbcUtils.scala:1411-1413
Issue: getCommonJDBCType(TimeType) previously returned None. With the framework enabled, it now returns Some(JdbcType("TIME", java.sql.Types.TIME)). The PR description says "No user-facing change."
Impact: Functional behavior change for JDBC operations with TimeType columns.
Suggested fix: Acknowledge this as a new capability in the PR description. Verify JDBC TIME handling across common drivers.

Finding 9: Repeated new TimeTypeConnectOps(TimeType()) allocations across 5+ sites

Severity: Medium
Category: Performance
File: ProtoTypeOps.scala (5 factory methods), ConnectArrowTypeOps.scala (2 factory methods)
Issue: Each factory method creates a new TimeTypeConnectOps instance per invocation. These are stateless and could be cached.
Impact: Unnecessary GC pressure during result deserialization.
Suggested fix: Cache a singleton TimeTypeConnectOps instance and reuse across all factory methods.

Finding 10: No new tests despite 966 lines of new code

Severity: Medium
Category: Test Quality
Issue: 5 new files and 15 modified files but zero new tests. The framework path includes new dispatch logic, new asInstanceOf casts, and new Arrow/proto conversion code. There are no tests verifying framework-ON/OFF equivalence.
Impact: Subtle bugs (wrong precision, wrong cast, wrong null handling) would go undetected until production rollout.
Suggested fix: Add: (1) roundtrip tests for each trait, (2) framework flag toggle tests, (3) fromArrowType reverse lookup test.

Finding 11: ClientTypeOps.apply does not check feature flag — inconsistent gating

Severity: Medium
Category: Rollout Safety
File: ClientTypeOps.scala:177
Issue: ClientTypeOps.apply(dt) delegates to TypeApiOps(dt) which presumably checks the flag. But ClientTypeOps.fromArrowType(at) explicitly checks the flag. This inconsistency makes the gating contract implicit and fragile.
Suggested fix: Add an explicit flag check in ClientTypeOps.apply or document that TypeApiOps is the gating layer.

Finding 12: createFieldWriterDefault unnecessarily private[sql]

Severity: Low
Category: Code Style
File: ArrowWriter.scala:586
Issue: createFieldWriterDefault is private[sql] but only called from within ArrowWriter. Should be private.

Finding 13: ProtoTypeOps companion has 5 separate registration points per type

Severity: Low
Category: API Design
File: ProtoTypeOps.scala
Issue: Adding a second framework type requires modifying 5 separate match clauses. Three dispatch by proto enum case and could be consolidated.
Suggested fix: Extract a shared opsForKindCase lookup.

Summary

┌──────────┬───────┐
│ Severity │ Count │
├──────────┼───────┤
│ Critical │ 0 │
├──────────┼───────┤
│ High │ 3 │
├──────────┼───────┤
│ Medium │ 7 │
├──────────┼───────┤
│ Low │ 3 │
└──────────┴───────┘

Total findings: 13

Recommended Actions

Fix Arrow precision mismatch — fromArrowType should not hardcode MICROS_PRECISION (Finding 1)
Fix toLiteralProto precision — use t.precision not DEFAULT_PRECISION (Finding 3)
Move HiveResult dispatch out of per-row loop — pre-compute per column (Finding 2)
Add framework flag toggle tests — verify ON/OFF equivalence for TimeType (Findings 6, 10)
Fix serializer dispatch order — framework should not intercept AgnosticExpressionPathEncoder (Finding 7)

davidm-db · 2026-04-02T12:52:40Z

@dejankrak-db All findings should be either addressed or replied to now.

Finding 1 (Arrow precision mismatch): Pre-existing behavior — the original ArrowUtils.fromArrowType also returns TimeType(MICROS_PRECISION) for nanosecond Arrow time vectors. Not introduced by this PR. It is tracked among the issues with TIME we discovered during the work on the framework and will be addressed separately.

Finding 2 (Per-row allocation in HiveResult): Valid concern. Singleton caching of ops instances across all factory companions is tracked for Phase 1d (Validation & Performance).

Finding 3 (toLiteralProto DEFAULT_PRECISION): Not a bug. The "generic" literal path receives a LocalTime value with no DataType context. LocalTime doesn't carry precision metadata — precision is a property of the column (DataType), not the value. DEFAULT_PRECISION is the correct choice here, same as the original code. The typed overload (toLiteralProtoWithType) correctly uses the DataType's precision.

Finding 4 (ConnectArrowTypeOps Any types): Already fixed in a previous iteration — widened FieldSerializer, ArrowVectorReader, and TimeVectorReader to private[connect]. The trait now uses proper return types and all asInstanceOf casts are removed from call sites.

Finding 5 (Null safety in HiveResult): The null case is the first match arm and always runs before framework dispatch. This ordering is structural (same as all other type cases in the match) — not something that can be accidentally refactored away without changing the entire method signature.

Finding 6 (Redundant TimeType in defaults): Intentional — project guideline: "Don't remove existing TimeType code. Its explicit case branches remain as safety net and fallback when the feature flag is off." Equivalence testing is tracked for Phase 1d/Phase 2.

Finding 7 (Serializer dispatch order vs. encoder type): AgnosticExpressionPathEncoder is a migration shim (its own Scaladoc says so), and isNativeEncoder handles primitives (Boolean, Int, Long, etc.). Framework types use dedicated leaf encoders (e.g., LocalTimeEncoder), never migration shims or native primitives. Added a comment at both dispatch sites documenting this.

Finding 8 (JDBC behavior change): Removed JDBC from this PR entirely — getCommonJDBCType reverted to original, JDBC methods removed from ClientTypeOps. A separate JdbcTypeOps trait in sql/core is planned for when full JDBC support (DDL + getter + setter) is ready.

Finding 9 (Repeated allocations): Tracked for Phase 1d — singleton caching across all factory companions uniformly.

Finding 10 (No new tests): Tracked for Phase 2 (Testing Framework). The feature flag is enabled by default in tests, so the entire existing test suite validates the framework path.

Finding 11 (ClientTypeOps.apply flag check): By design — ClientTypeOps(dt) delegates to TypeApiOps(dt) which checks the flag. The inline comment explains: "Delegates to TypeApiOps and narrows: a type must implement TypeApiOps AND mix in ClientTypeOps to be found here." fromArrowType checks explicitly because it's a reverse lookup that doesn't go through TypeApiOps.

Finding 12 (createFieldWriterDefault visibility): Fixed — changed from private[sql] to private.

Finding 13 (ProtoTypeOps 5 registration points): Already fixed while addressing one of the Wenchen's comments — consolidated to 3 via a shared opsForKindCase lookup. toCatalystType, getScalaConverterForKind, and getProtoDataTypeFromLiteral all delegate through a single KindCase registration point.

cloud-fan · 2026-04-06T16:49:03Z

if the final design is to have one interface per module, then shall we merge ProtoTypeOps and ConnectArrowTypeOps as they are both in connect/common?

davidm-db · 2026-04-09T10:03:23Z

if the final design is to have one interface per module, then shall we merge ProtoTypeOps and ConnectArrowTypeOps as they are both in connect/common?

@cloud-fan merged two ops classes into a single one. all other refactors we talked about, like optional functionalities, etc. will be investigated and done as a follow-up, since they don't really fit into the topic of this pull request.

cloud-fan · 2026-04-09T12:28:00Z

sql/api/src/main/scala/org/apache/spark/sql/types/ops/ClientTypeOps.scala

+ *   TimeTypeApiOps for a reference implementation
+ * @since 4.2.0
+ */
+trait ClientTypeOps { self: TypeApiOps =>


TypeApiOps and ClientTypeOps are both for client side? shall we merge them into one? I'm ok to leave additional work for followup PRs, but not commit a draft version and rewrite it in followup.

That's true. It's also true for CatalystTypeOps and TypeOps who are both on the catalyst side. However, in both cases, one is optional and the other is mandatory. I thought of merging it like this to enable functionality, and follow-up with a refactor, where I would introduce the concept of optional functions (returning Options) in the ops classes, since it's a framework design, rather than enabling functionality.
If you think however that we should do it here, I can, I was just explaining my reasoning.

davidm-db changed the title ~~supporting various clients thorugh types framework~~ [SPARK-55441][SQL] Types Framework - Phase 1c - Client Integration Mar 19, 2026

davidm-db commented Mar 20, 2026

View reviewed changes

davidm-db force-pushed the davidm-db/types_framework_1c branch 2 times, most recently from 8da3e67 to db8b270 Compare March 30, 2026 12:16

davidm-db marked this pull request as ready for review March 30, 2026 12:16

davidm-db force-pushed the davidm-db/types_framework_1c branch from db8b270 to 38aef26 Compare March 30, 2026 12:24

cloud-fan reviewed Mar 31, 2026

View reviewed changes

davidm-db force-pushed the davidm-db/types_framework_1c branch from 0f0049d to 7c65800 Compare April 3, 2026 19:42

davidm-db added 17 commits April 9, 2026 09:01

supporting various clients thorugh types framework

a7eee13

minor refactors + scalafmt

ff1263c

scala style

0d7cb57

styling

58a74f9

more style

30b593a

more style

1d49fbd

more style

ad7cdfd

minor last style hopefully

e81ebf5

resolve self-review found issues

6ba67e0

minor comment to clarify

4d47d26

scalastyle

bc89339

remove non-ascii leftover characters

c694e5e

addressing cloud-fan's comments

510cb91

addressing dejankrak's comments

ca30590

build break fix

3778dbf

scalafmt

4051864

merging connect ops into a single trait

bc61b04

davidm-db force-pushed the davidm-db/types_framework_1c branch from 7c65800 to bc61b04 Compare April 9, 2026 09:45

revert bad merge conflict resolution

9757da5

scalafmt

b7764bb

cloud-fan reviewed Apr 9, 2026

View reviewed changes

Conversation

davidm-db commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidm-db Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidm-db Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dejankrak-db commented Apr 1, 2026

Uh oh!

davidm-db commented Apr 2, 2026

Uh oh!

cloud-fan commented Apr 6, 2026

Uh oh!

davidm-db commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidm-db commented Mar 19, 2026 •

edited

Loading

davidm-db Mar 20, 2026 •

edited

Loading

davidm-db Apr 2, 2026 •

edited

Loading

davidm-db commented Apr 9, 2026 •

edited

Loading