Skip to content

[FEATURE] Support shared catalog config and per-table shuffle overrides in iceberg-source #6726

@lawofcycles

Description

@lawofcycles

Is your feature request related to a problem? Please describe.

When configuring multiple tables in iceberg-source, users must repeat the same catalog properties for each table entry even when the tables share the same catalog. Additionally, shuffle parameters (partitions, target_partition_size) cannot be tuned per table.

Describe the solution you'd like

  1. Allow a top level catalog definition that applies to all tables by default. When a table specifies its own catalog, it fully replaces the top level definition (no partial merge).
iceberg:
 catalog:
   type: rest
   uri: "http://iceberg-rest-catalog:8181"
   io-impl: "org.apache.iceberg.aws.s3.S3FileIO"
 tables:
    - table_name: "db.table_a"
     identifier_columns: ["id"]
    - table_name: "db.table_b"
     identifier_columns: ["id"]
     catalog:
       type: glue
       warehouse: "s3://other-bucket/warehouse"
       io-impl: "org.apache.iceberg.aws.s3.S3FileIO"
  1. Allow per table overrides for shuffle parameters (partitions, target_partition_size). When a table specifies its own shuffle, it fully replaces the top level shuffle parameters. Node level settings like server_port and ssl remain top level only and are not overridable per table.
iceberg:
 shuffle:
   partitions: 64
   target_partition_size: 64mb
 tables:
    - table_name: "db.large_table"
     shuffle:
       partitions: 256
       target_partition_size: 128mb
    - table_name: "db.small_table"

Additional context

Related PR: #6682 (source-layer shuffle implementation)

Metadata

Metadata

Assignees

Labels

ease-of-useImproving the ease-of-use for an existing featureenhancementNew feature or request

Type

No type

Projects

Status

Unplanned

Relationships

None yet

Development

No branches or pull requests

Issue actions