Skip to content

Commit 81faba3

Browse files
authored
Basic Extension Type Registry Implementation (#20312)
## Which issue does this PR close? This is a PR based on #18552 that contains a basic implementation of an extension type registry. The driving use case is pretty-printing data frames with custom types. - Closes #18223. Ping @paleolimbot @adriangb if you're still interested. ### Most Important Changes to the Old PR - We no longer use the Logical Type, as there is no real conses on how DataFusion should allow "inline" references to extension types. As a consequence, the formatting query plans use case in the old PR no longer works. Extension types can only be used where DataFusion has a reference to a registry (e.g., DataFrame pretty-printing). @paleolimbot I've called it `DFExtensionType` instead of `BoundExtensionType` to avoid the need of explaining "bind". If you think there is merit in the other term, let me know. I think otherwise, this aligns with your proposal. - Added a more complex example with a parameterized type to demonstrate the entire ability of the API - No extension types are registered by default, users must opt-in ## Rationale for this change - Allow customized behavior based on extension type metadata. ## What changes are included in this PR? - Add an `ExtensionTypeRegistry` - Add `DFArrayFormatterFactory` which creates custom pretty-printers when formatting data frames. - Add an extension type registry to the `SessionState` / `SessionContext` - A Full Example of using the API - An implementation for the UUID canonical extension type ## Are these changes tested? - Yes, but only two end-to-end tests. - One for pretty-printing UUID values - One for pretty-printing in the example Happy to add more tests if this PR has a chance of being merged ## Are there any user-facing changes? Yes, the entire Extension Type API is new.
1 parent a0869e9 commit 81faba3

26 files changed

Lines changed: 1209 additions & 20 deletions

File tree

Cargo.lock

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

datafusion-examples/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,16 @@ cargo run --example dataframe -- dataframe
126126
| mem_pool_tracking | [`execution_monitoring/memory_pool_tracking.rs`](examples/execution_monitoring/memory_pool_tracking.rs) | Demonstrates memory tracking |
127127
| tracing | [`execution_monitoring/tracing.rs`](examples/execution_monitoring/tracing.rs) | Demonstrates tracing integration |
128128

129+
## Extension Types Examples
130+
131+
### Group: `extension_types`
132+
133+
#### Category: Single Process
134+
135+
| Subcommand | File Path | Description |
136+
| ----------- | --------------------------------------------------------------------------- | ------------------------------------ |
137+
| temperature | [`extension_types/temperature.rs`](examples/extension_types/temperature.rs) | Extension type for temperature data. |
138+
129139
## External Dependency Examples
130140

131141
### Group: `external_dependency`
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
//! # Extension type usage examples
19+
//!
20+
//! These examples demonstrate the API for creating and using custom extension types.
21+
//!
22+
//! ## Usage
23+
//! ```bash
24+
//! cargo run --example extension_types -- [all|temperature]
25+
//! ```
26+
//!
27+
//! Each subcommand runs a corresponding example:
28+
//! - `all` — run all examples included in this module
29+
//!
30+
//! - `temperature`
31+
//! (file: temperature.rs, desc: Extension type for temperature data.)
32+
33+
mod temperature;
34+
35+
use datafusion::error::{DataFusionError, Result};
36+
use strum::{IntoEnumIterator, VariantNames};
37+
use strum_macros::{Display, EnumIter, EnumString, VariantNames};
38+
39+
#[derive(EnumIter, EnumString, Display, VariantNames)]
40+
#[strum(serialize_all = "snake_case")]
41+
enum ExampleKind {
42+
All,
43+
Temperature,
44+
}
45+
46+
impl ExampleKind {
47+
const EXAMPLE_NAME: &str = "extension_types";
48+
49+
fn runnable() -> impl Iterator<Item = ExampleKind> {
50+
ExampleKind::iter().filter(|v| !matches!(v, ExampleKind::All))
51+
}
52+
53+
async fn run(&self) -> Result<()> {
54+
match self {
55+
ExampleKind::All => {
56+
for example in ExampleKind::runnable() {
57+
println!("Running example: {example}");
58+
Box::pin(example.run()).await?;
59+
}
60+
}
61+
ExampleKind::Temperature => {
62+
temperature::temperature_example().await?;
63+
}
64+
}
65+
Ok(())
66+
}
67+
}
68+
69+
#[tokio::main]
70+
async fn main() -> Result<()> {
71+
let usage = format!(
72+
"Usage: cargo run --example {} -- [{}]",
73+
ExampleKind::EXAMPLE_NAME,
74+
ExampleKind::VARIANTS.join("|")
75+
);
76+
77+
let example: ExampleKind = std::env::args()
78+
.nth(1)
79+
.unwrap_or_else(|| ExampleKind::All.to_string())
80+
.parse()
81+
.map_err(|_| DataFusionError::Execution(format!("Unknown example. {usage}")))?;
82+
83+
example.run().await
84+
}

0 commit comments

Comments
 (0)