- Make a PR to update the rust-toolchain file in the root of the repository:
Below is a checklist of what you need to do to add a new scalar function to DataFusion:
- Add the actual implementation of the function to a new module file within:
- New function modules - for example a
vectormodule, should use a rust feature (for examplevector_expressions) to allow DataFusion users to enable or disable the new module as desired. - The implementation of the function is done via implementing
ScalarUDFImpltrait for the function struct.- See the advanced_udf.rs example for an example implementation
- Add tests for the new function
- To connect the implementation of the function add to the mod.rs file:
- a
mod xyz;where xyz is the new module file - a call to
make_udf_function!(..); - an item in
export_functions!(..);
- a
- In sqllogictest/test_files, add new
sqllogictestintegration tests where the function is called through SQL against well known data and returns the expected result.- Documentation for
sqllogictesthere
- Documentation for
- Add SQL reference documentation here
- An example of this being done can be seen here
- Run
./dev/update_function_docs.shto update docs
Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
- Add the actual implementation of an
AccumulatorandAggregateExpr: - In datafusion/expr/src, add:
- a new variant to
AggregateFunction - a new entry to
FromStrwith the name of the function as called by SQL - a new line in
return_typewith the expected return type of the function, given an incoming type - a new line in
signaturewith the signature of the function (number and types of its arguments) - a new line in
create_aggregate_exprmapping the built-in to the implementation - tests to the function.
- a new variant to
- In sqllogictest/test_files, add new
sqllogictestintegration tests where the function is called through SQL against well known data and returns the expected result.- Documentation for
sqllogictesthere
- Documentation for
- Add SQL reference documentation here
- An example of this being done can be seen here
- Run
./dev/update_function_docs.shto update docs
The query plans represented by LogicalPlan nodes can be graphically
rendered using Graphviz.
To do so, save the output of the display_graphviz function to a file.:
// Create plan somehow...
let mut output = File::create("/tmp/plan.dot")?;
write!(output, "{}", plan.display_graphviz());Then, use the dot command line tool to render it into a file that
can be displayed. For example, the following command creates a
/tmp/plan.pdf file:
dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdfWe are using prettier to format .md files.
You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary. Using npx required a working node environment. Upgrading to the latest prettier is recommended (by adding --upgrade to the npm command).
$ prettier --version
2.3.0After you've confirmed your prettier version, you can format all the .md files:
prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.mdWe use taplo to format .toml files.
For Rust developers, you can install it via:
cargo install taplo-cli --lockedRefer to the Installation section on other ways to install it.
$ taplo --version
taplo 0.9.0After you've confirmed your taplo version, you can format all the .toml files:
taplo fmtThe prost/tonic code can be generated by running ./regen.sh, which in turn invokes the Rust binary located in gen
This is necessary after modifying the protobuf definitions or altering the dependencies of gen, and requires a valid installation of protoc (see installation instructions for details).
./regen.shDocumentations for the UDF documentations are generated from code (related github issue). To generate markdown run ./update_function_docs.sh.
This is necessary after adding new UDF implementation or modifying existing implementation which requires to update documentation.
./dev/update_function_docs.sh