Skip to content

Commit 2bc0c11

Browse files
authored
šŸ”€ MERGE: Improve Notebook Execution (#236)
1. Standardise auto/cache execution Both now call the same underlying function (from jupyter-cache) and act the same. This improves auto, by making it output error reports and not raising an exception on an error. Additional config has also been added: `execution_allow_errors` and `execution_in_temp`. Like for timeout, `allow_errors` can also be set in the notebook `metadata.execution.allow_errors` This presents one breaking change, in that `auto` will now by default execute in a temporary folder as the cwd. (we could set temp to False by default, but I think this is safer?) 2. For both methods, executions data is captured into: ```python env.nb_execution_data[env.docname] = { "mtime": datetime.datetime.utcnow().isoformat(), "runtime": runtime, "method": execution_method, "succeeded": succeeded, } ``` and a directive `nb-exec-table` has been added, to create a table of these results.
2 parents f98fa54 + d186389 commit 2bc0c11

27 files changed

Lines changed: 634 additions & 95 deletions

ā€Ždocs/use/execute.mdā€Ž

Lines changed: 78 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -16,104 +16,100 @@ kernelspec:
1616
# Executing and cacheing your content
1717

1818
MyST-NB can automatically run and cache notebooks contained in your project using [jupyter-cache].
19-
Notebooks can either be run each time the documentation is built, or cached
20-
locally so that re-runs occur only when code cells have changed.
19+
Notebooks can either be run each time the documentation is built, or cached locally so that re-runs occur only when code cells have changed.
2120

22-
Cacheing behavior is controlled with configuration in your `conf.py` file. See
23-
the sections below for each configuration option and its effect.
21+
Caching behaviour is controlled with configuration in your `conf.py` file.
22+
See the sections below for each configuration option and its effect.
2423

2524
(execute/config)=
2625

2726
## Triggering notebook execution
2827

29-
To trigger the execution of notebook pages, use the following configuration in `conf.py`
28+
To trigger the execution of notebook pages, use the following configuration in `conf.py`:
3029

31-
```
30+
```python
3231
jupyter_execute_notebooks = "auto"
3332
```
3433

35-
By default, this will only execute notebooks that are missing at least one output. If
36-
a notebook has *all* of its outputs populated, then it will not be executed.
34+
By default, this will only execute notebooks that are missing at least one output.
35+
If a notebook has *all* of its outputs populated, then it will not be executed.
3736

38-
**To force the execution of all notebooks, regardless of their outputs**, change the
39-
above configuration value to:
37+
**To force the execution of all notebooks, regardless of their outputs**, change the above configuration value to:
4038

41-
```
39+
```python
4240
jupyter_execute_notebooks = "force"
4341
```
4442

45-
**To cache execution outputs with [jupyter-cache]**, change the above configuration
46-
value to:
43+
**To cache execution outputs with [jupyter-cache]**, change the above configuration value to:
4744

48-
```
45+
```python
4946
jupyter_execute_notebooks = "cache"
5047
```
5148

5249
See {ref}`execute/cache` for more information.
5350

54-
**To turn off notebook execution**, change the
55-
above configuration value to:
51+
**To turn off notebook execution**, change the above configuration value to:
5652

57-
```
53+
```python
5854
jupyter_execute_notebooks = "off"
5955
```
6056

61-
**To exclude certain file patterns from execution**, use the following
62-
configuration:
57+
**To exclude certain file patterns from execution**, use the following configuration:
6358

64-
```
59+
```python
6560
execution_excludepatterns = ['list', 'of', '*patterns']
6661
```
6762

68-
Any file that matches one of the items in `execution_excludepatterns` will not be
69-
executed.
63+
Any file that matches one of the items in `execution_excludepatterns` will not be executed.
7064

7165
(execute/cache)=
7266
## Cacheing the notebook execution
7367

74-
As mentioned above, you can **cache the results of executing a notebook page** by setting
68+
As mentioned above, you can **cache the results of executing a notebook page** by setting:
7569

76-
```
70+
```python
7771
jupyter_execute_notebooks = "cache"
7872
```
7973

80-
in your conf.py file. In this case, when a page is executed, its outputs
81-
will be stored in a local database. This allows you to be sure that the
82-
outputs in your documentation are up-to-date, while saving time avoiding
83-
unnecessary re-execution. It also allows you to store your `.ipynb` files in
84-
your `git` repository *without their outputs*, but still leverage a cache to
85-
save time when building your site.
74+
in your conf.py file.
75+
76+
In this case, when a page is executed, its outputs will be stored in a local database.
77+
This allows you to be sure that the outputs in your documentation are up-to-date, while saving time avoiding unnecessary re-execution.
78+
It also allows you to store your `.ipynb` files (or their `.md` equivalent) in your `git` repository *without their outputs*, but still leverage a cache to save time when building your site.
8679

8780
When you re-build your site, the following will happen:
8881

89-
* Notebooks that have not seen changes to their **code cells** since the last build
90-
will not be re-executed. Instead, their outputs will be pulled from the cache
91-
and inserted into your site.
92-
* Notebooks that **have any change to their code cells** will be re-executed, and the
93-
cache will be updated with the new outputs.
82+
* Notebooks that have not seen changes to their **code cells** or **metadata** since the last build will not be re-executed.
83+
Instead, their outputs will be pulled from the cache and inserted into your site.
84+
* Notebooks that **have any change to their code cells** will be re-executed, and the cache will be updated with the new outputs.
9485

95-
By default, the cache will be placed in the parent of your build folder. Generally,
96-
this is in `_build/.jupyter_cache`.
86+
By default, the cache will be placed in the parent of your build folder.
87+
Generally, this is in `_build/.jupyter_cache`.
9788

9889
You may also specify a path to the location of a jupyter cache you'd like to use:
9990

100-
```
101-
jupyter_cache = path/to/mycache
91+
```python
92+
jupyter_cache = "path/to/mycache"
10293
```
10394

104-
The path should point to an **empty folder**, or a folder where a
105-
**jupyter cache already exists**.
95+
The path should point to an **empty folder**, or a folder where a **jupyter cache already exists**.
10696

10797
[jupyter-cache]: https://github.com/executablebooks/jupyter-cache "the Jupyter Cache Project"
10898

99+
## Executing in temporary folders
100+
101+
By default, the command working directory (cwd) that a notebook runs in will be its parent directory.
102+
However, you can set `execution_in_temp=True` in your `conf.py`, to change this behaviour such that, for each execution, a temporary directory will be created and used as the cwd.
103+
109104
(execute/timeout)=
110105
## Execution Timeout
111106

112107
The execution of notebooks is managed by {doc}`nbclient <nbclient:client>`.
113108

114-
The `execution_timeout` sphinx option defines the maximum time (in seconds) each notebook cell is allowed to run, if the execution takes longer an exception will be raised.
109+
The `execution_timeout` sphinx option defines the maximum time (in seconds) each notebook cell is allowed to run.
110+
if the execution takes longer an exception will be raised.
115111
The default is 30 s, so in cases of long-running cells you may want to specify an higher value.
116-
The timeout option can also be set to None or -1 to remove any restriction on execution time.
112+
The timeout option can also be set to `None` or -1 to remove any restriction on execution time.
117113

118114
This global value can also be overridden per notebook by adding this to you notebooks metadata:
119115

@@ -126,19 +122,32 @@ This global value can also be overridden per notebook by adding this to you note
126122
}
127123
```
128124

129-
## Execution FAQs
125+
(execute/allow_errors)=
126+
## Dealing with code that raises errors
130127

131-
### How can I include code that raises errors?
128+
In some cases, you may want to intentionally show code that doesn't work (e.g., to show the error message).
129+
You can achieve this at "three levels":
132130

133-
In some cases, you may want to intentionally show code that doesn't work (e.g., to show
134-
the error message). To do this, add a `raises-exception` tag to your code cell. This
135-
can be done via a Jupyter interface, or via the `{code-cell}` directive like so:
131+
Globally, by setting `execution_allow_errors=True` in your `conf.py`.
136132

137-
````
133+
Per notebook (overrides global), by adding this to you notebooks metadata:
134+
135+
```json
136+
{
137+
"metadata": {
138+
"execution": {
139+
"allow_errors": true
140+
}
141+
}
142+
```
143+
144+
Per cell, by adding a `raises-exception` tag to your code cell.
145+
This can be done via a Jupyter interface, or via the `{code-cell}` directive like so:
146+
147+
````md
138148
```{code-cell}
139-
---
140-
tags: [raises-exception]
141-
---
149+
:tags: [raises-exception]
150+
142151
print(thisvariabledoesntexist)
143152
```
144153
````
@@ -151,3 +160,20 @@ tags: [raises-exception]
151160
---
152161
print(thisvariabledoesntexist)
153162
```
163+
164+
(execute/statistics)=
165+
## Execution Statistics
166+
167+
As notebooks are executed, certain statistics are stored in a dictionary (`{docname:data}`), and saved on the [sphinx environment object](https://www.sphinx-doc.org/en/master/extdev/envapi.html#sphinx.environment.BuildEnvironment) as `env.nb_execution_data`.
168+
169+
You can access this in a post-transform in your own sphinx extensions, or use the built-in `nb-exec-table` directive:
170+
171+
````md
172+
```{nb-exec-table}
173+
```
174+
````
175+
176+
which produces:
177+
178+
```{nb-exec-table}
179+
```

ā€Ždocs/use/start.mdā€Ž

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,17 @@ MyST-NB then adds some additional configuration, specific to notebooks:
5252
* - `jupyter_execute_notebooks`
5353
- "auto"
5454
- The logic for executing notebooks, [see here](execute/config) for details.
55+
* - `execution_in_temp`
56+
- `False`
57+
- If `True`, then a temporary directory will be created and used as the command working directory (cwd), if `False` then the notebook's parent directory will be the cwd.
58+
* - `execution_allow_errors`
59+
- `False`
60+
- If `False`, when a code cell raises an error the execution is stopped, if `True` then all cells are always run.
61+
This can also be overridden by metadata in a notebook, [see here](execute/allow_errors) for details.
5562
* - `execution_timeout`
5663
- 30
5764
- The maximum time (in seconds) each notebook cell is allowed to run.
58-
This can be overridden by metadata in a notebook, [see here](execute/timeout) for detail.
65+
This can also be overridden by metadata in a notebook, [see here](execute/timeout) for details.
5966
* - `execution_show_tb`
6067
- `False`
6168
- Show failed notebook tracebacks in stdout (in addition to writing to file).

ā€Žmyst_nb/__init__.pyā€Ž

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
JupyterCell,
1717
)
1818

19-
from .cache import update_execution_cache
19+
from .execution import update_execution_cache
2020
from .parser import (
2121
NotebookParser,
2222
CellNode,
@@ -35,6 +35,7 @@
3535
PasteInlineNode,
3636
)
3737
from .nb_glue.transform import PasteNodesToDocutils
38+
from .exec_table import setup_exec_table
3839

3940
LOGGER = logging.getLogger(__name__)
4041

@@ -104,6 +105,8 @@ def visit_element_html(self, node):
104105
app.add_config_value("execution_excludepatterns", [], "env")
105106
app.add_config_value("jupyter_execute_notebooks", "auto", "env")
106107
app.add_config_value("execution_timeout", 30, "env")
108+
app.add_config_value("execution_allow_errors", False, "env")
109+
app.add_config_value("execution_in_temp", False, "env")
107110
# show traceback in stdout (in addition to writing to file)
108111
# this is useful in e.g. RTD where one cannot inspect a file
109112
app.add_config_value("execution_show_tb", False, "")
@@ -130,6 +133,9 @@ def visit_element_html(self, node):
130133
app.add_domain(NbGlueDomain)
131134
app.add_directive("code-cell", CodeCell)
132135

136+
# execution statistics table
137+
setup_exec_table(app)
138+
133139
# TODO need to deal with key clashes in NbGlueDomain.merge_domaindata
134140
# before this is parallel_read_safe
135141
return {"version": __version__, "parallel_read_safe": False}
@@ -178,6 +184,8 @@ def set_valid_execution_paths(app):
178184
for suffix, parser_type in app.config["source_suffix"].items()
179185
if parser_type in ("myst-nb",)
180186
}
187+
if not hasattr(app.env, "nb_execution_data"):
188+
app.env.nb_execution_data = {}
181189

182190

183191
def add_exclude_patterns(app, config):

ā€Žmyst_nb/exec_table.pyā€Ž

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
"""A directive to create a table of executed notebooks, and related statistics."""
2+
from datetime import datetime
3+
4+
from docutils import nodes
5+
from sphinx.transforms.post_transforms import SphinxPostTransform
6+
from sphinx.util.docutils import SphinxDirective
7+
8+
9+
def setup_exec_table(app):
10+
"""execution statistics table."""
11+
app.add_node(ExecutionStatsNode)
12+
app.add_directive("nb-exec-table", ExecutionStatsTable)
13+
app.add_post_transform(ExecutionStatsPostTransform)
14+
15+
16+
class ExecutionStatsNode(nodes.General, nodes.Element):
17+
"""A placeholder node, for adding a notebook execution statistics table."""
18+
19+
20+
class ExecutionStatsTable(SphinxDirective):
21+
"""Add a notebook execution statistics table."""
22+
23+
has_content = True
24+
final_argument_whitespace = True
25+
26+
def run(self):
27+
28+
return [ExecutionStatsNode()]
29+
30+
31+
class ExecutionStatsPostTransform(SphinxPostTransform):
32+
"""Replace the placeholder node with the final table."""
33+
34+
default_priority = 400
35+
36+
def run(self, **kwargs) -> None:
37+
for node in self.document.traverse(ExecutionStatsNode):
38+
node.replace_self(make_stat_table(self.env.nb_execution_data))
39+
40+
41+
def make_stat_table(nb_execution_data):
42+
43+
key2header = {
44+
"mtime": "Modified",
45+
"method": "Method",
46+
"runtime": "Run Time (s)",
47+
"succeeded": "Status",
48+
}
49+
50+
key2transform = {
51+
"mtime": lambda x: datetime.fromtimestamp(x).strftime("%Y-%m-%d %H:%M")
52+
if x
53+
else "",
54+
"method": str,
55+
"runtime": lambda x: "-" if x is None else str(round(x, 2)),
56+
"succeeded": lambda x: "āœ…" if x is True else "āŒ",
57+
}
58+
59+
# top-level element
60+
table = nodes.table()
61+
table["classes"] += ["colwidths-auto"]
62+
# self.set_source_info(table)
63+
64+
# column settings element
65+
ncols = len(key2header) + 1
66+
tgroup = nodes.tgroup(cols=ncols)
67+
table += tgroup
68+
colwidths = [round(100 / ncols, 2)] * ncols
69+
for colwidth in colwidths:
70+
colspec = nodes.colspec(colwidth=colwidth)
71+
tgroup += colspec
72+
73+
# header
74+
thead = nodes.thead()
75+
tgroup += thead
76+
row = nodes.row()
77+
thead += row
78+
79+
for name in ["Document"] + list(key2header.values()):
80+
row.append(nodes.entry("", nodes.paragraph(text=name)))
81+
82+
# body
83+
tbody = nodes.tbody()
84+
tgroup += tbody
85+
86+
for doc, data in nb_execution_data.items():
87+
row = nodes.row()
88+
tbody += row
89+
row.append(nodes.entry("", nodes.paragraph(text=doc)))
90+
for name in key2header.keys():
91+
text = key2transform[name](data[name])
92+
row.append(nodes.entry("", nodes.paragraph(text=text)))
93+
94+
return table

0 commit comments

Comments
Ā (0)
⚔