feat: Add `auto_explain` mode by nuno-faria · Pull Request #19316 · apache/datafusion

nuno-faria · 2025-12-14T11:33:25Z

Which issue does this PR close?

Closes Add auto_explain mode #19215.

Rationale for this change

Allowing users to check the execution plans without needing to change the existing application.

The auto_explain mode can be enabled with the datafusion.explain.auto_explain config. In addition, there are two other configs:

datafusion.explain.auto_explain_output: sets the output location of the plans. Supports stdout, stderr, and a file path.
datafusion.explain.auto_explain_min_duration: only outputs plans whose duration is greater than this value (similar to Postgres' auto_explain.log_min_duration).

Example in datafusion-cli:

-- regular mode
> select 1;
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row(s) fetched.

-- with auto_explain enabled (the plan is not actually part of the result, it is sent to stdout)
> select 1;
+-------------------+------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                         |
+-------------------+------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[1 as Int64(1)], metrics=[output_rows=1, elapsed_compute=21.50µs, output_bytes=8.0 B, output_batches=1] |
|                   |   PlaceholderRowExec, metrics=[]                                                                                             |
|                   |                                                                                                                              |
+-------------------+------------------------------------------------------------------------------------------------------------------------------+
+----------+
| Int64(1) |
+----------+
| 1        |
+----------+
1 row(s) fetched.

What changes are included in this PR?

Extended the existing AnalyzeExec operator to support the auto_explain mode.
Added new explain configs.
Wrap plans in a AnalyzeExec operator when auto_explain is enabled.
Added tests.

Are these changes tested?

Yes.

Are there any user-facing changes?

New feature, but it's completely optional.

nuno-faria · 2025-12-14T11:35:07Z

+        self.cache =
+            Self::compute_properties(&self.input, Arc::clone(&self.input.schema()));


Needs to be recomputed since the output changes.

nuno-faria · 2025-12-14T11:36:00Z

+            if auto_explain {
+                if duration.as_millis() >= auto_explain_min_duration as u128 {
+                    export_auto_explain(out, &auto_explain_output)?;
+                }
+                concat_batches(&inner_schema, &batches).map_err(DataFusionError::from)
+            } else {
+                Ok(out)
+            }


The auto_explain mode will return the input's batches instead of the analyze.

nuno-faria · 2025-12-14T11:42:28Z

cc: @alamb @2010YOUY01 @NGA-TRAN @carpecodeum

martin-g · 2025-12-15T12:34:57Z

+    let fd: &mut dyn Write = match output {
+        "stdout" => &mut io::stdout(),
+        "stderr" => &mut io::stderr(),
+        _ => &mut OpenOptions::new().create(true).append(true).open(output)?,


Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?

Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.

Does this need any kind of validation of the file location ?
Or it is left to the developer/admin to make sure it is a safe place ?

I think it's better to leave this to the user (either way, an error is returned).

Does this need some kind of synchronisation when a file path is used for the output ? Two or more DF sessions using the same config may try to write to the same file simultaneously.

I think again the responsibility of this falls on the user. Is it common to use multiple sessions over the same config?

martin-g · 2025-12-15T12:52:37Z

+# test auto_explain
+
+statement ok
+set datafusion.explain.auto_explain_output = 'test_files/scratch/auto_explain.txt';


Does something assert the contents of this output file ?
Does something remove this file at the end ?

I originally tried to load the file to a table as CSV, as I think it is the only feasible way to check the contents, but since the file cannot be removed the result would always change. I mainly added these sqllogictests just to check the "set ..." commands.

As for removing the file, I'm not sure it is possible. With that said, I don't think it is necessary since it's written to the sqllogictest temporary dir.

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

alamb

Thanks @nuno-faria and @martin-g -- sorry for the delay in reviewing this PR

Form my perspective, this is a valuable feature but having the output to stdout / stderr in the core library is not idea.

Instead, it seems to me that auto explain belongs in the client applications themselves (e.g. datafusion-cli)

I tried to explain my thinking more here

#19215 (comment)

nuno-faria · 2026-03-02T20:24:23Z

@alamb thanks for the review. I also get the concerns of polluting stdout with plans, and in my case I would be setting datafusion.explain.auto_explain_output pointed to some file to avoid this with the auto_explain enabled. It was a way to simulate the auto_explain of Postgres, but since DataFusion does not have a default log, it had to be stdout/stderr (or fail if a file is not provided). As for exposing it only in datafusion-cli, in my case it would be less useful, as I'm using the library directly.

I will have to look at the Observer suggestion once I get the time.

alamb · 2026-03-04T23:55:22Z

pointed to some file to avoid this with the auto_explain enabled. It was a way to simulate the auto_explain of Postgres,

What about using log::info! or something (aka the existing logging hooks 🤔 )

nuno-faria · 2026-03-08T17:15:52Z

What about using log::info! or something (aka the existing logging hooks 🤔 )

I added log::info (as well as log::error, log::warn, ...) instead of stdout/stderr.

I was also thinking of adding the query to the output, similar to Postgre's auto_explain. It would have to come from the unparsed logical plan since the original query is not available. However, maybe some users would not need the query, or maybe some would need other info that is not outputted. So maybe it might actually be better to leave this auto_explain feature for users to implement downstream as they wish.

alamb · 2026-03-10T13:01:27Z

What about using log::info! or something (aka the existing logging hooks 🤔 )

I added log::info (as well as log::error, log::warn, ...) instead of stdout/stderr.

I was also thinking of adding the query to the output, similar to Postgre's auto_explain. It would have to come from the unparsed logical plan since the original query is not available. However, maybe some users would not need the query, or maybe some would need other info that is not outputted. So maybe it might actually be better to leave this auto_explain feature for users to implement downstream as they wish.

I think if you wanted to make it easier to implement downstream maybe we would add some sort of API / callback thing (trait object?) that could get the info for auto_explains.

Then the default implementation could log with info and downstream users could override it if they wanted

nuno-faria · 2026-03-10T20:00:49Z

I think if you wanted to make it easier to implement downstream maybe we would add some sort of API / callback thing (trait object?) that could get the info for auto_explains.

Then the default implementation could log with info and downstream users could override it if they wanted

Sounds good. I'll try to tackle this soon.

nuno-faria · 2026-03-21T16:59:51Z

@alamb I refactored the previous auto_explain mode to now use a new PlanObserver trait. It has a method to be called when the physical plan is built and another to be called when the plan completes, using the result of the analyze operator.

pub trait PlanObserver: Send + Sync + 'static + Debug {
    fn plan_created(
        &self,
        id: &str,
        logical_plan: &LogicalPlan,
        physical_plan: &Arc<dyn ExecutionPlan>,
    ) -> Result<()>;
    
    fn plan_executed(
        &self,
        id: &str,
        explain_result: RecordBatch,
        duration: Duration,
    ) -> Result<()>;
}

The AnalyzeExec operator can now also receive a callback to pass the result. I opted for this approach to avoid the physical operators to depend on the PlanOperator. This way we can reuse the code in the analyze operator to provide the already formatted output.

It can be used like this:

let plan_observer = DefaultPlanObserver::new("auto_explain.txt".to_owned(), 0);
let ctx = SessionContext::new().with_plan_observer(Arc::new(plan_observer));
ctx.sql("create table t (k int, v int)").await?.collect().await?;

// auto explain needs to be enabled
ctx.sql("set datafusion.explain.auto_explain = true").await?.collect().await?;

ctx.sql("select * from t where k = 1 or k = 2 order by v desc limit 5").await?.collect().await?;

The DefaultPlanObserver writes using the log crate or a file, and it looks like this:

QUERY: SELECT t.k, t.v FROM t WHERE ((t.k = 1) OR (t.k = 2)) ORDER BY t.v DESC NULLS FIRST LIMIT 5
DURATION: 0.689ms
EXPLAIN:
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: TopK(fetch=5), expr=[v@1 DESC], preserve_partitioning=[false], metrics=[output_rows=0, elapsed_compute=13.40µs, output_bytes=0.0 B, output_batches=0, row_replacements=0] |
|                   |   FilterExec: k@0 = 1 OR k@0 = 2, metrics=[output_rows=0, elapsed_compute=1ns, output_bytes=0.0 B, output_batches=0, selectivity=N/A (0/0)]                                         |
|                   |     DataSourceExec: partitions=1, partition_sizes=[0], metrics=[]                                                                                                                   |
|                   |                                                                                                                                                                                     |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

(If the sql feature is not enabled, the SQL query is not written.)

Let me know what you think about the API.

alamb

Thanks @nuno-faria -- I went over this ne again. I think it is getting close

alamb · 2026-03-27T21:00:40Z

+pub struct DefaultPlanObserver {
+    output: String,
+    min_duration_ms: usize,
+    /// stores a SQL representation of the logical plan, if the `sql` feature is enabled.


Codex points out that on the error path this map never gets cleaned up -- so a bunch of errors will potentially cause this map to grow without bound. Maybe something we could clean up as a follow on PR (file a ticket to fix, etc)

Good call, I completely missed that. I think we could use a queue that removes queries from that map once it reaches some limit.

alamb · 2026-03-27T21:02:47Z

+    ///   - `log::info`
+    ///   - `log::debug`
+    ///   - `log::trace`
+    ///   - a file path: creates the file if it does not exist, or appends to it if it does.


I think it would make more sense to have an explicit enum or something here especially as this can do file I/O

I worry that if someone accidentally passes in log:error (one :) that will write to a file named log:error

What do you think about using something like

enum Output { LogError, LogWarn, ... LogToFile(String), }

?

alamb · 2026-03-27T21:03:17Z

            function_factory,
            cache_factory,
            prepared_plans: HashMap::new(),
+            plan_observer: Some(Arc::new(DefaultPlanObserver::default())),


This should only be set when auto_explain is enabled, right?

Ideally yes, but then as far as I'm aware the user wouldn't be able to turn on the default auto_explain without also having to first set a plan_observer. But if it is better that way I can set it to None, let me know.

alamb · 2026-03-27T21:04:58Z

-        self.optimize_physical_plan(plan, session_state, |_, _| {})
+        let mut plan = self.optimize_physical_plan(plan, session_state, |_, _| {})?;
+
+        // setup the auto explain mode if necessary


I wonder if this would be cleaner to add to handle_explain_or_analyze (or put it in its own method)?

I moved this to the existing setup_auto_explain method so that everything is there. I'm not sure if it would fit in the handle_explain_or_analyze method since at that point the Analyze is still not in the plan.

alamb · 2026-03-27T21:07:01Z

+    return_inner: bool,
+}
+
+/// Optionally used by the `AnalyzeExec` operator to callback it with the result.


We won't be able to easily add new things to this callback

It might also be useful to point out somewhere that setting return_inner is effectively going to buffer the entire query output into RAM (which could be quite large)

I wonder if it would make sense to define a trait

trait AnalyzeObserver { ... }

And then instead of

/// If Some, passes the output of the analyze once it completes, as well as the duration. callback: Option<AnalyzeCallback>,

Do something like

/// If Some, passes the output of the analyze once it completes, as well as the duration. callback: Option<Arc<dyn AnalyzeObserver>>,

That way if we want to add more methods (like adding something which sees the plan metrics) it would be easier to add without an API change.

I replaced the callback with the following trait:

pub trait AnalyzeObserver: Debug + Send + Sync { /// Provides the EXPLAIN ANALYZE output (annotated plan) and the total duration. fn analyze_result_callback( &self, result: RecordBatch, duration: std::time::Duration, ) -> Result<()>; }

alamb · 2026-03-27T21:09:51Z

+    fn plan_executed(
+        &self,
+        id: &str,
+        explain_result: RecordBatch,


is this result the actual output batches?

Or is it the annotated explain plan?

The annotated plan. I changed the variable name to annotated_plan to make it clearer.

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

… into auto_explain

alamb · 2026-04-08T19:31:50Z

Shoot -- I lost track of this one. I put it on my short list for review

feat: Add auto_explain mode

fecb8d5

github-actions Bot added documentation Improvements or additions to documentation core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate physical-plan Changes to the physical-plan crate labels Dec 14, 2025

Merge branch 'main' into auto_explain

359dec3

nuno-faria commented Dec 14, 2025

View reviewed changes

Fix warnings

1a8d6f8

martin-g reviewed Dec 15, 2025

View reviewed changes

nuno-faria and others added 7 commits December 15, 2025 19:32

Avoid storing batches in AnalyzeExec when not needed

fb295d0

Keep auto_explain properties in with_new_children

ae2d6a1

Update datafusion/common/src/config.rs

0bac7ba

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Update datafusion/physical-plan/src/analyze.rs

27d1662

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Refactor auto_explain_min_duration to auto_explain_min_duration_ms

f574b4f

Re-run CI

1743c00

Merge branch 'main' into auto_explain

166d585

github-actions Bot removed the documentation Improvements or additions to documentation label Feb 1, 2026

Update configs.md

46feaf7

github-actions Bot added the documentation Improvements or additions to documentation label Feb 1, 2026

nuno-faria mentioned this pull request Feb 27, 2026

Release DataFusion 53.0.0 (Feb 2026 / Mar 2026) #19692

Closed

26 tasks

alamb mentioned this pull request Mar 2, 2026

Add auto_explain mode #19215

Open

alamb reviewed Mar 2, 2026

View reviewed changes

Use log instead of stdout/stderr

0397da3

Refactor to PlanObserver

884afe7

github-actions Bot added the execution Related to the execution crate label Mar 15, 2026

nuno-faria added 4 commits March 15, 2026 12:13

Merge branch 'main' into auto_explain

32fda90

Fix merge

6ddec42

Add missing license

6eace4c

Refactor PlanObserver and auto_explain

30875cc

github-actions Bot removed the execution Related to the execution crate label Mar 21, 2026

nuno-faria commented Mar 21, 2026

View reviewed changes

Comment thread datafusion/core/src/execution/plan_observer.rs

alamb reviewed Mar 27, 2026

View reviewed changes

nuno-faria and others added 4 commits March 28, 2026 15:29

Incorporate feedback

7ea1797

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Merge branch 'main' into auto_explain

7d9d9fd

Refactor explain_result to annotated_plan

de9e953

Merge branch 'auto_explain' of https://github.com/nuno-faria/datafusion…

6bcdfdc

… into auto_explain

		self.cache =
		Self::compute_properties(&self.input, Arc::clone(&self.input.schema()));

Conversation

nuno-faria commented Dec 14, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nuno-faria commented Dec 14, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

nuno-faria commented Mar 2, 2026

Uh oh!

alamb commented Mar 4, 2026

Uh oh!

nuno-faria commented Mar 8, 2026

Uh oh!

alamb commented Mar 10, 2026

Uh oh!

nuno-faria commented Mar 10, 2026

Uh oh!

nuno-faria commented Mar 21, 2026

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects