This example executes a SQL query in a distributed context.
For this example to work, it's necessary to spawn some localhost workers with the localhost_worker.rs example:
This example queries a couple of test parquet we have for integration tests, and those files are stored using git lfs,
so pulling the first is necessary.
git lfs install
git lfs checkoutIn two different terminals spawn two workers
cargo run --example localhost_worker -- 8080cargo run --example localhost_worker -- 8081The positional numeric argument is the port in which each worker will listen to.
Now, DataFusion queries can be issued using these workers as part of the cluster.
cargo run --example localhost_run -- 'SELECT count(*), "MinTemp" FROM weather GROUP BY "MinTemp"' --cluster-ports 8080,8081The head stage (the one that outputs data to the user) will be executed locally in the same process as that cargo run
command, but further stages will be delegated to the workers running on ports 8080 and 8081.
Additionally, the --explain flag can be passed to render the distributed plan:
cargo run --example localhost_run -- 'SELECT count(*), "MinTemp" FROM weather GROUP BY "MinTemp"' --cluster-ports 8080,8081 --show-distributed-planTwo tables are available in this example:
flights_1m: Flight data with 1m rows
FL_DATE [INT32]
DEP_DELAY [INT32]
ARR_DELAY [INT32]
AIR_TIME [INT32]
DISTANCE [INT32]
DEP_TIME [FLOAT]
ARR_TIME [FLOAT]
weather: Small dataset of weather data
MinTemp [DOUBLE]
MaxTemp [DOUBLE]
Rainfall [DOUBLE]
Evaporation [DOUBLE]
Sunshine [BYTE_ARRAY]
WindGustDir [BYTE_ARRAY]
WindGustSpeed [BYTE_ARRAY]
WindDir9am [BYTE_ARRAY]
WindDir3pm [BYTE_ARRAY]
WindSpeed9am [BYTE_ARRAY]
WindSpeed3pm [INT64]
Humidity9am [INT64]
Humidity3pm [INT64]
Pressure9am [DOUBLE]
Pressure3pm [DOUBLE]
Cloud9am [INT64]
Cloud3pm [INT64]
Temp9am [DOUBLE]
Temp3pm [DOUBLE]
RainToday [BYTE_ARRAY]
RISK_MM [DOUBLE]
RainTomorrow [BYTE_ARRAY]
For an example of worker version filtering during rolling deployments, see localhost_versioned_worker.rs and localhost_versioned_run.rs.
cargo run --example localhost_versioned_worker -- 8080 --version v2cargo run --example localhost_versioned_worker -- 8081 --version v2cargo run --example localhost_versioned_run -- 'SELECT count(*), "MinTemp" FROM weather GROUP BY "MinTemp"' --cluster-ports 8080,8081 --version v2Workers that don't match the requested version are excluded from the cluster.