Skip to content

Add Ballista support to DataFusion CLI#889

Merged
andygrove merged 15 commits intoapache:masterfrom
andygrove:ballista-cli
Aug 16, 2021
Merged

Add Ballista support to DataFusion CLI#889
andygrove merged 15 commits intoapache:masterfrom
andygrove:ballista-cli

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Aug 15, 2021

Which issue does this PR close?

Closes #886 .

$ ./target/release/datafusion-cli --host localhost --port 50050
DataFusion CLI v4.0.0-SNAPSHOT

> create external table lineitem stored as parquet location '/mnt/tpch/parquet-sf100-partitioned/lineitem';
0 rows in set. Query took 0.015 seconds.

> select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
        l_shipdate <= date '1998-09-02'
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;
+--------------+--------------+------------+--------------------+--------------------+-------------------+--------------------+--------------------+----------------------+-------------+
| l_returnflag | l_linestatus | sum_qty    | sum_base_price     | sum_disc_price     | sum_charge        | avg_qty            | avg_price          | avg_disc             | count_order |
+--------------+--------------+------------+--------------------+--------------------+-------------------+--------------------+--------------------+----------------------+-------------+
| A            | F            | 3775127758 | 5660776097194.456  | 5377736398183.934  | 5592847429515.928 | 25.499370423275426 | 38236.116984304936 | 0.05000224353092895  | 148047881   |
| N            | F            | 98553062   | 147771098385.9801  | 140384965965.03497 | 145999793032.7757 | 25.501556956882876 | 38237.19938880453  | 0.049985284338054006 | 3864590     |
| N            | O            | 7436302959 | 11150725648169.863 | 10593195276359.283 | 11016932215670.58 | 25.500009433521782 | 38237.227663621445 | 0.0499979183499098   | 291619616   |
| R            | F            | 3775724970 | 5661603032745.348  | 5378513563915.405  | 5593662252666.917 | 25.50006628406532  | 38236.697258453016 | 0.050001304339654135 | 148067261   |
+--------------+--------------+------------+--------------------+--------------------+-------------------+--------------------+--------------------+----------------------+-------------+
4 rows in set. Query took 8.677 seconds.

Rationale for this change

Enables a SQL repl for Ballista with minimal effort.

What changes are included in this PR?

datafusion-cli now has new host and port arguments

Are there any user-facing changes?

No

@andygrove andygrove changed the title DRAFT: Add Ballista support to DataFusion CLI Add Ballista support to DataFusion CLI Aug 15, 2021
@andygrove andygrove marked this pull request as ready for review August 15, 2021 19:33
@andygrove andygrove requested review from Dandandan, alamb and houqp August 15, 2021 19:33
@houqp
Copy link
Copy Markdown
Member

houqp commented Aug 15, 2021

looks like code blocks in readme needs to be marked with no_run

@houqp
Copy link
Copy Markdown
Member

houqp commented Aug 16, 2021

my bad, no_run still compiles the code, we would need to use ignore here :(

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code and it looks good to me. I think we have reasonable CI test coverage for the datafusion-cli (as it is used in the python integration test suite that @jimexist started).

Thanks @andygrove

.map_err(|e| DataFusionError::Execution(format!("{:?}", e)))?;
Ok(Context::Remote(BallistaContext::remote(h, p, &config)))
}
_ => Ok(Context::Local(ExecutionContext::with_config(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 perfect

@andygrove andygrove merged commit d4e32d1 into apache:master Aug 16, 2021
@andygrove andygrove deleted the ballista-cli branch August 16, 2021 13:24
@jimexist
Copy link
Copy Markdown
Member

Sorry that I've been quite busy these days. Will review by end of the week and also see what else i can contribute to. This is exciting!

@houqp houqp added the enhancement New feature or request label Nov 6, 2021
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
* make script shebangs more flexible

* move location of python tests

This change allows us to run the test suite against a release candidate
hosted on test.pypi.org.

```
pip install --extra-index-url https://test.pypi.org/simple/ datafusion==40.0.0

pytest --import-mode=importlib python/tests
```

* add documentation for testing against release candidate

* add dev script for cleaning maturin build artifacts

* update ruff lint config for new test location
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update datafusion-cli to support Ballista, or implement new ballista-cli

4 participants