This document describes how ODDOL works today in code.
For product overview, see README.md.
ODDOL is motivated by a gap between open dataset publication and reproducible learning artifacts.
- Datasets are published openly, but analysis workflows are often under-documented.
- Published conclusions and underlying data-processing steps are difficult for learners to connect.
- Online learners need traceable links between datasets, methods, tools, and results.
For the full conceptual narrative (restored from the earlier architecture document), see:
design/problem-overview.md
ODDOL is a client-only SvelteKit application. It does not include an ODDOL backend API.
- UI and logic run in the browser.
- Data is fetched directly from external providers:
- Wikidata (SPARQL)
- OpenAlex (REST)
- Zenodo (REST)
- DataCite (REST)
- Analysis runs locally with DuckDB-WASM.
- Query caching is local (memory + IndexedDB).
src/routes/+page.svelte: federated search entry page.src/routes/analyze/+page.svelte: local SQL analysis, file/URL loading, charting.src/routes/describe/+page.svelte: analysis documentation export (Markdown, JSON-LD, BibTeX).src/routes/contribute/+page.svelte: contribution workflow (Wikidata-oriented).src/routes/+layout.svelte: shell layout, navigation, footer.
src/lib/federation/engine.ts- Selects compatible sources for selected entity types.
- Executes source queries in parallel.
- Merges and ranks results.
src/lib/federation/entity-resolver.ts- Deduplicates entities across sources (identifier-based).
src/lib/federation/rate-limiter.ts- Applies per-source request limits and retry behavior.
src/lib/sources/*.ts- Source-specific clients and query translation:
wikidata.tsopenalex.tszenodo.tsdatacite.ts
- Source-specific clients and query translation:
src/lib/stores/search.ts- Search state, filters, pagination, and cache lookup.
src/lib/cache/query-cache.ts- Query cache with memory and IndexedDB.
- Default TTL is 15 minutes.
src/lib/analysis/duckdb-engine.ts- DuckDB-WASM lifecycle, data loading, SQL execution, schema/statistics.
src/lib/stores/analysis.ts- Analysis page state and orchestration.
src/lib/visualization/*- Chart generation from query output.
- User submits query and filters in
src/routes/+page.svelte. searchStore.search()builds aFederatedQuery.- Query hash is computed and cache is checked.
- On cache miss, federation engine queries selected/compatible sources.
- Results are merged, deduplicated, ranked, and paginated.
- Unified entities are rendered as result cards.
- Clicking
Analyzeon a result card sends context in URL params (id,q,title,doi,sources). - Analyze page loads entity details via federation engine when
idis present. - Entity metadata is converted to a table row and loaded into DuckDB.
- Initial query (
SELECT * FROM entity_data) runs automatically.
Input options:
- Sample data (in-app).
- Upload
.csvor.jsonfile (parsed in browser). - Load from URL through DuckDB readers (
csv/json/parquetinference).
Execution:
- SQL is run in DuckDB-WASM through analysis store.
- Results can be exported to CSV.
- Chart config is built from selected X/Y columns and rendered in browser.
- Query history is stored in
localStorage.
- Analyze page builds a draft payload (title, source references, SQL query, notes).
- Draft is stored in
sessionStorage. - Describe page reads and pre-fills form fields.
- User exports documentation in:
- Markdown
- PROV-like JSON-LD
- BibTeX
ODDOL normalizes source responses into UnifiedEntity (src/lib/types/index.ts), including:
id(source-qualified)identifiers(doi,wikidata,openalex,zenodo,datacite, etc.)type(dataset,publication,software,author,organization)title,description,creators,publisher,licensesources[]withsourceId,sourceUrl, and retrieval timestamp
This allows source-agnostic rendering and downstream analysis.
ODDOL currently uses browser-side storage only:
- Query cache:
- Memory map + IndexedDB (
oddol-cache) - TTL default: 15 minutes
- Memory map + IndexedDB (
- Preferences store in IndexedDB (no TTL semantics in implementation)
localStorage:- SQL query history (
oddol:sql-history)
- SQL query history (
sessionStorage:- Analyze->Describe draft (
oddol:describe-draft)
- Analyze->Describe draft (
No ODDOL-managed server storage is implemented in this repository.
- No first-party analytics SDK is integrated.
- Queries are sent directly to third-party APIs; those providers may log requests.
- Google Fonts are loaded from Google CDN in
src/app.html. - Query transparency is partial:
- Generated SPARQL is visible in the SPARQL builder.
- A full REST request inspector UI is not implemented.
- Source query failures are captured as partial results in federation search.
- Rate limiting and retry wrapper reduce transient API failures.
- Cache failures degrade to live querying.
- DuckDB initialization/data loading errors surface through analysis store error state.
- No authentication/authorization layer.
- No backend orchestration for cross-provider logging/auditing.
- No complete API request trace view for all source calls.
- Some type-check issues exist in unrelated modules (tracked in current branch state).
This section documents how ODDOL transforms a unified FederatedQuery into source-specific requests.
Common query object (from src/lib/types/index.ts):
text: stringfilters: QueryFilter[]sources: string[]entityTypes: EntityType[]pagination: { limit, offset?, cursor? }
- Routing is handled by
src/lib/federation/engine.ts. - If sources are explicitly selected, ODDOL keeps only sources that:
- have an available client
- support at least one selected entity type
- Query execution is parallelized per selected source and merged afterward.
- Transport: SPARQL POST to endpoint.
- Entity type mapping:
- dataset ->
Q1172284 - publication ->
Q591041 - software ->
Q7397
- dataset ->
- Text search is translated to label
CONTAINS(...). - Filters supported by current implementation:
topic,license,publisher,format
- Pagination mapping:
limit->LIMIToffset->OFFSET
- Transport: REST GET to
/works. - Query mapping:
text->search- pagination limit ->
per_page - cursor ->
cursor - always includes
mailtoparameter for polite pool usage
- Filters supported:
- year range, type, open access, DOI presence, concept, institution, author
- Sorting:
- with text: relevance descending
- without text: publication date descending
- Transport: REST GET to
/records. - Query mapping:
- text and filter parts are combined in
q - entity type translated to
resource_type.type - size/page pagination (
size,page)
- text and filter parts are combined in
- Filters supported:
- access, community, keyword, license
- Sorting:
mostrecent
- Transport: REST GET to
/dois. - Query mapping:
text->query- entity types ->
resource-type-id - page size/number ->
page[size],page[number]
- Filters supported:
- year, publisher, client, affiliation
- Sorting:
-created
Each source client maps raw responses into UnifiedEntity:
- Required practical fields for rendering:
id,type,title,identifiers,sources
- Optional enrichments:
description,creators,publisher,license,created,metadata
- If one or more sources fail, search returns partial results with an error list.
- UI currently surfaces source failure summary text from
searchStore.
State fields:
query,entityTypes,sources,filtersresults,totalCountisLoading,errorhasMore,currentPage
State transitions:
setQuery/setEntityTypes/setSources/setFiltersmutate filter state.search():- sets loading true, clears error, resets page to 1
- checks cache by query hash
- on hit: populates results from cache
- on miss: executes federated query and caches result
loadMore():- executes next page query
- appends results
clear()resets to initial state.
State fields:
- selected entity and table context:
selectedEntity,loadedTables,currentTable,schema
- data/query outputs:
preview,queryResult,columnStats
- execution controls:
isLoading,error,sqlQuery
State transitions:
initialize()prepares DuckDB engine.loadData()/loadFromUrl():- create/replace table
- refresh schema and preview
executeQuery():- runs SQL and stores result/error
- utility actions:
- stats, histogram/group-by helpers, drop table, clear state.
Some UI state is intentionally kept at route level:
- Analyze route:
- chart configuration
- SQL validation message
- query history (
localStorage) - describe handoff draft (
sessionStorage)
- Describe route:
- form fields and local export payload composition
ODDOL currently captures provenance-relevant signals from:
- source metadata embedded in each
UnifiedEntity.sources[] - active SQL query text in Analyze
- analysis description fields in Describe
Describe route exports:
- Markdown narrative report
- JSON-LD graph with PROV-style terms
- BibTeX citation stub
JSON-LD graph contains:
- analysis entity (
prov:Entity,schema:CreativeWork) - activity (
prov:Activity,schema:Action) - agent (
prov:Agent,schema:Person) - source entities (
prov:Entity) derived from listed data sources
- Provenance capture is semi-manual (user-entered and context-prefilled).
- No immutable run ID or signed execution trace.
- No full automatic capture of every REST request URL/parameter.