Ballista: Implement map-side of shuffle

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
As a small step towards implementing https://github.com/apache/arrow-datafusion/issues/63 it would be good to implement a feature where a query stage can apply hash partitioning to the data produced by its plan and write one IPC file per output partition. This implements the map-side of the shuffle operation.

**Describe the solution you'd like**
`QueryStageExec` should have an optional output partitioning scheme, restricted to accepting hash partitioning only. If this is specified then the batches produced by the query stage's plan would be hash-partitioned and written to separate files on disk.

**Describe alternatives you've considered**
None

**Additional context**
None


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ballista: Implement map-side of shuffle #456

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Ballista: Implement map-side of shuffle #456

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions