Skip to content

docs: Insert relevant "Learn more" links in gallery examples#4002

Open
joelostblom wants to merge 7 commits intomainfrom
docs/gallery-links
Open

docs: Insert relevant "Learn more" links in gallery examples#4002
joelostblom wants to merge 7 commits intomainfrom
docs/gallery-links

Conversation

@joelostblom
Copy link
Copy Markdown
Contributor

close #4001, see there for some more background motivation.

This PR includes both direct backlinks to parts of the user docs that explicitly references a gallery examples. These backlinks will also work with the proposed file-reading capabilities of the ..altair-plot directive in vega/sphinxext-altair#14.

In addition to links to directly referencing pages, I've also included some heuristics for potentially relevant pages, such as always linking the docs of a mark or transform that is used in the example. I'm sure there are more that can be added than what I thought of but this could at least be a good starting point (especially the interactivity links could become more fine grained).

It will be difficult to evaluate how effective this is at shepherding readers to other relevant pages in the docs, but at least personally, this is something I would find useful if I wanted to learn more about a specific example. I also find it minimally disturbing, since it is at the end of each page:

image image

So that it is easier for readers to know where to find more info.

- sphinxext/altairgallery.py
- Added backlink collection on doctree-read:
  - scans pending_xref targets for gallery_*
  - scans altair_plot nodes for code_source_file, maps file stem to gallery example name
  - captures nearest section anchor + section title
- Added rendering on doctree-resolved for gallery example pages:
  - injects a Referenced In section
  - links to exact heading anchors when available
  - label format: Doc Title - Section Title (or just doc title when same)
- Added purge handler to keep env state clean between rebuilds.
- In sphinxext/altairgallery.py:
  - Added heuristic link generation from gallery example code (populate_examples() code text).
  - Kept explicit backlinks first, then appends:
    - Additional related documentation:
    - heuristic bullet list
  - Deduplicates heuristic links that are already present in explicit backlinks.
  - Added support helpers:
    - _example_code_map()
    - _doc_ref(...)
    - _heuristic_links_for_example(...)
  - Also fixed missing import re.
Heuristics implemented
- Marks/transforms:
  - Detects .mark_<name>(...) and links to user_guide/marks/<name>
  - Detects .transform_<name>(...) and links to user_guide/transform/<name>
  - If any transform detected, also links:
    - user_guide/transform/index#accessing-transformed-data
- Encodings index heading links when detected:
  - Aggregation or binning → user_guide/encodings/index#encoding-aggregates
  - Explicit type → user_guide/encodings/index#encoding-data-types
  - Special characters (escaped \:, \., \[, \]) → ...#escaping-special-characters-in-column-names
  - Sorting → ...#sort-option
  - alt.datum / alt.value → ...#datum-and-value
- Interactivity:
  - parameter/selection/interactive patterns → user_guide/interactions/index
- Faceting/concatenation/repeat:
  - links user_guide/compound_charts
- Temporal axis:
  - links both:
    - user_guide/times_and_dates
    - user_guide/transform/timeunit
Validation
- Ran full forced rebuild:
  - ALTAIR_GALLERY_GENERATE=0 ALTAIR_AUTOSUMMARY_GENERATE=0 uv run --extra doc --with-editable ../sphinxext-altair sphinx-build -E -a -b html doc doc/_build/html
- Confirmed output contains heuristic links on example pages (e.g. doc/_build/html/gallery/bar_chart_sorted.html).
Note
- Heuristics are intentionally pattern-based (quick and maintainable), so they may occasionally miss or over-include links for edge-case code styles. If you want, next I can tune thresholds/ordering and labels for cleaner UX (e.g., group by “Marks / Transforms / Encodings”).
@dsmedia
Copy link
Copy Markdown
Member

dsmedia commented Apr 16, 2026

Nice work @joelostblom and I fully spport the discoverability motivation you outlined in #4001.

Since you're already working in populate_examples() and computing both the example metadata and the doc links: would you be open to extending this PR to also persist that to a JSON file (e.g., _data/examples.json)?

Right now everything is ephemeral — computed during the Sphinx build, rendered into HTML, discarded. Persisting it would make it available to external consumers as a reusable artifact, similar to how VL already publishes site/_data/examples.json.

Two fields in particular would be valuable beyond the Sphinx build:

related_docs — the user-guide links your PR already computes (both the explicit backrefs and the heuristic links). Persisting these means any consumer — vega-datasets, search tools, LLMs — can map examples to relevant documentation without reimplementing the detection logic.

datasets — which vega-datasets resources each example uses. vega/vega-datasets#776 builds a cross-gallery examples registry mapping ~400 Vega, VL, and Altair examples to datasets. For Vega and VL the generator walks JSON specs mechanically. For Altair it currently scrapes via the Trees API, parses docstrings with regex, and extracts dataset references by pattern-matching data.cars() / data.cars.url against source text — ~180 lines of fragile code. datasets is the one field the generator can't reliably derive from outside: the regex catches today's three patterns, but for examples that use data.cars() (returns a DataFrame), there's no way to recover the dataset name from the compiled spec since it inlines as values.

I'm imagining something like:

[
  {
    "name": "selection_detail",
    "title": "Selection Detail",
    "description": "Shows how to use a selection to filter a detail view.",
    "categories": ["interactive"],
    "path": "tests/examples_methods_syntax/selection_detail.py",
    "related_docs": [
      "user_guide/interactions/index",
      "user_guide/encodings/index#datum-and-value"
    ],
    "datasets": ["cars"]
  }
]

The base fields (name, title, description, categories, path) are what populate_examples() already knows, and related_docs is what this PR already computes. datasets would be a new addition that only altair can provide authoritatively — it requires resolving which data.X() calls each example makes, which is new logic.

I know adding dataset extraction might be scope creep for this PR. But even just persisting what you've already built here (related_docs + base metadata) to a JSON file would lay the exact foundation we need. datasets could come in a follow-up.

This connects to your comment on vega-datasets#776 about consuming the gallery examples registry from altair's sphinx extension — a persisted index would work in both directions. On the vega-datasets side I'm thinking about publishing a Table Schema contract that defines a standard shape for gallery example metadata across the vega org — so each library's index can be validated in CI and consumed uniformly. cc @domoritz since this touches VL's existing examples.json pattern and the vega-datasets schema work.

@joelostblom
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed comment @dsmedia! I think it is a great idea to expand on this and formalize the connections/links from the gallery examples into a standardized format that can also be used by the gallery discovery features you are working on (and other initiatives like you mentioned). I will have a go at it and see if I can adapt this PR to persist the currently created metadata and also include the dataset used in each example. I did do some changes to the sphinx altair extension to as part of this effort, but I think they would just require some tweaking to work with a persisted JSON file instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Link from each gallery example to the relevant part of the user docs

2 participants