Skip to content

Commit 1e883f4

Browse files
adriangbjayshrivastava
authored andcommitted
Preserve PhysicalExpr graph in proto round trip using Arc pointers as unique identifiers (apache#20037)
Replaces apache#18192 using the APIs in apache#19437. Similar to apache#18192 the end goal here is specifically to enable deduplication of `DynamicFilterPhysicalExpr` so that distributed query engines can get one step closer to using dynamic filters. Because it's actually simpler we apply this deduplication to all `PhysicalExpr`s with the added benefit that we more faithfully preserve the original expression tree (instead of adding new duplicate branches) which will have the immediate impact of e.g. not duplicating large `InListExpr`s.
1 parent db5f47c commit 1e883f4

8 files changed

Lines changed: 1089 additions & 2 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

datafusion/proto/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ datafusion-proto-common = { workspace = true }
6969
object_store = { workspace = true }
7070
pbjson = { workspace = true, optional = true }
7171
prost = { workspace = true }
72+
rand = { workspace = true }
7273
serde = { version = "1.0", optional = true }
7374
serde_json = { workspace = true, optional = true }
7475

datafusion/proto/proto/datafusion.proto

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -836,6 +836,14 @@ message PhysicalExprNode {
836836
// Was date_time_interval_expr
837837
reserved 17;
838838

839+
// Unique identifier for this expression to do deduplication during deserialization.
840+
// When serializing, this is set to a unique identifier for each combination of
841+
// expression, process and serialization run.
842+
// When deserializing, if this ID has been seen before, the cached Arc is returned
843+
// instead of creating a new one, enabling reconstruction of referential integrity
844+
// across serde roundtrips.
845+
optional uint64 expr_id = 30;
846+
839847
oneof ExprType {
840848
// column references
841849
PhysicalColumn column = 1;

datafusion/proto/src/generated/pbjson.rs

Lines changed: 22 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

datafusion/proto/src/generated/prost.rs

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)