Skip to content

Native JupyterLite Support #20882

@jmchilton

Description

@jmchilton

Visualization Limitations

The visualization is amazing but it assumes you’re starting from a single dataset - the notebook. It would be nice to launch sometime like notebook from any dataset or from a scratch. It would be nice to host the notebooks directly outside the concept of datasets so we can provide revisions. A working save button and launching from more flexible contexts would provide a richer experience.

There is also a reproducibility argument here - I think we should want the the base environment of the notebooks to evolve over time as more Python packages are Pyodide compatible. We should be able to save the static runtime environments for older notebook executions and allow them to be served going forward. I think we need to support multiple runtimes with different capabilities.

The reproducibility argument has a traceability component also. gxy.put just causes a dataset to appear in the history without context about how it got there. There is not artifact to trace it back to. If we could have at the notebook revision that created it - that would be something. The way Jupyter works doesn’t really guarantee reproducibility here but it is more traceable and is a good step in the right direction I think. If instead of using API calls directly - we used annotations about inputs and outputs - we could get that notebook experience with a much richer concept of reproducibility and true traceability.

To me the most exciting level of reproducibility Galaxy can provide is to make things “automatically reusable” and that typically means “workflow extraction”. Extracting a workflow from a history is a challenging problems and there are a lot of existing headaches in the process - but it should be what drives us in terms of design. If we can get that right - we have reached real reusability. Achieving annotated inputs and outputs with JupyterLite to tie them to histories would be huge in this regard. We could be workflow nodes around this that allow notebooks in middle of Galaxy workflows to mirror the ad hoc computation that they allow.

Past reproducibility and back to usability - I think if we had these as annotated entities and first class workflow nodes - we could allow Notebooks to generate portions of workflow reports or to serve as an alternative to the workflow report.

Managing Multiple Runtimes (Why)

Claude claims there no existing solutions to manage multiple JupyterLite runtimes.

Based on my search and knowledge of the JupyterLite ecosystem, there are currently no existing tools that manage multiple JupyterLite runtimes in the sophisticated way Galaxy would need. Here's what exists vs. what's missing:

Closest Existing Solutions

JupyterHub + Multiple Kernels:

  • Can provide different Python environments, but not browser-based
  • Requires server infrastructure
  • Doesn't solve the Pyodide versioning challenge

Binder with repo2docker:

  • Can build different environments from repository configurations
  • Each environment is a separate deployment
  • Not designed for runtime switching within a single interface

Custom JupyterLite Builds:

  • Organizations create their own JupyterLite builds with custom packages
  • Still single-runtime per deployment
  • Manual process for updates

Managing Multiple Runtimes (How)

From: https://jupyterlite.readthedocs.io/en/stable/howto/pyodide/packages.html#bundling-additional-packages-by-default

At the moment the most reasonable way to make additional packages available by default when starting the Python kernel is to add new packages to the Pyodide distribution.

Once you have added the new package(s) and rebuilt Pyodide, you can [configure JupyterLite to use a custom Pyodide distribution](https://jupyterlite.readthedocs.io/en/stable/howto/pyodide/pyodide.html).

Loading custom Python code - Pyodide
Hosting Pyodide

UI

TODO

Database Schema

TODO

Challenges

I asked an AI to outline potential challenges - here are the most serious and potentially blocking ones IMO.

User Experience

  • Choice overload: Users may be confused by having to select runtimes, especially if many are available.
  • Consistency: Different runtimes may behave slightly differently, leading to inconsistent results or support questions.

My (@jmchilton) response to this is we could dive into a default notebook and just keep the other environments around for reproducibility and as advanced options for specific applications.

Maintenance Burden

  • Pyodide rebuilds: Maintaining multiple custom Pyodide distributions requires regular updates as upstream evolves.
  • Version drift: Ensuring compatibility across notebook revisions, Galaxy, and runtime versions could be resource-intensive.

My (@jmchilton) response to this is we could serve all this static content on CVMFS and deal with this pressure as a community. My guess is we'd only add a version per release or so and we can manage that I think.

Security

  • Notebook trust model: Executing arbitrary user code in the browser is mostly sandboxed, but storing and re-executing notebooks in Galaxy may expose new attack vectors.
  • Runtime supply chain: Hosting and distributing custom Pyodide builds introduces risks if binaries are tampered with or misconfigured.

Future-Proofing

  • Rapid Pyodide changes: Pyodide is under active development; APIs and build systems change frequently.
  • Beyond Pyodide: Locking the design too closely to Pyodide might make it harder to adopt future WebAssembly-based kernels or alternative runtimes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions