HOWTOs

How to update the version of Rust used in CI tests

Make a PR to update the rust-toolchain file in the root of the repository:

How to add a new scalar function

Below is a checklist of what you need to do to add a new scalar function to DataFusion:

Add the actual implementation of the function to a new module file within:
- here for arrays, maps and structs functions
- here for crypto functions
- here for datetime functions
- here for encoding functions
- here for math functions
- here for regex functions
- here for string functions
- here for unicode functions
- create a new module here for other functions.
New function modules - for example a vector module, should use a rust feature (for example vector_expressions) to allow DataFusion users to enable or disable the new module as desired.
The implementation of the function is done via implementing ScalarUDFImpl trait for the function struct.
- See the advanced_udf.rs example for an example implementation
- Add tests for the new function
To connect the implementation of the function add to the mod.rs file:
- a mod xyz; where xyz is the new module file
- a call to make_udf_function!(..);
- an item in export_functions!(..);
In sqllogictest/test_files, add new sqllogictest integration tests where the function is called through SQL against well known data and returns the expected result.
- Documentation for sqllogictest here
Add SQL reference documentation here
- An example of this being done can be seen here
- Run ./dev/update_function_docs.sh to update docs

How to add a new aggregate function

Below is a checklist of what you need to do to add a new aggregate function to DataFusion:

Add the actual implementation of an Accumulator and AggregateExpr:
In datafusion/expr/src, add:
- a new variant to AggregateFunction
- a new entry to FromStr with the name of the function as called by SQL
- a new line in return_type with the expected return type of the function, given an incoming type
- a new line in signature with the signature of the function (number and types of its arguments)
- a new line in create_aggregate_expr mapping the built-in to the implementation
- tests to the function.
In sqllogictest/test_files, add new sqllogictest integration tests where the function is called through SQL against well known data and returns the expected result.
- Documentation for sqllogictest here
Add SQL reference documentation here
- An example of this being done can be seen here
- Run ./dev/update_function_docs.sh to update docs

How to display plans graphically

The query plans represented by LogicalPlan nodes can be graphically rendered using Graphviz.

To do so, save the output of the display_graphviz function to a file.:

// Create plan somehow...
let mut output = File::create("/tmp/plan.dot")?;
write!(output, "{}", plan.display_graphviz());

Then, use the dot command line tool to render it into a file that can be displayed. For example, the following command creates a /tmp/plan.pdf file:

dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf

How to format `.md` document

We are using prettier to format .md files.

You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary. Using npx required a working node environment. Upgrading to the latest prettier is recommended (by adding --upgrade to the npm command).

$ prettier --version
2.3.0

After you've confirmed your prettier version, you can format all the .md files:

prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md

How to format `.toml` files

We use taplo to format .toml files.

For Rust developers, you can install it via:

cargo install taplo-cli --locked

Refer to the Installation section on other ways to install it.

$ taplo --version
taplo 0.9.0

After you've confirmed your taplo version, you can format all the .toml files:

taplo fmt

How to update protobuf/gen dependencies

The prost/tonic code can be generated by running ./regen.sh, which in turn invokes the Rust binary located in gen

This is necessary after modifying the protobuf definitions or altering the dependencies of gen, and requires a valid installation of protoc (see installation instructions for details).

./regen.sh

How to add/edit documentation for UDFs

Documentations for the UDF documentations are generated from code (related github issue). To generate markdown run ./update_function_docs.sh.

This is necessary after adding new UDF implementation or modifying existing implementation which requires to update documentation.

./dev/update_function_docs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HOWTOs

How to update the version of Rust used in CI tests

How to add a new scalar function

How to add a new aggregate function

How to display plans graphically

How to format `.md` document

How to format `.toml` files

How to update protobuf/gen dependencies

How to add/edit documentation for UDFs

FilesExpand file tree

howtos.md

Latest commit

History

howtos.md

File metadata and controls

HOWTOs

How to update the version of Rust used in CI tests

How to add a new scalar function

How to add a new aggregate function

How to display plans graphically

How to format .md document

How to format .toml files

How to update protobuf/gen dependencies

How to add/edit documentation for UDFs

How to format `.md` document

How to format `.toml` files