Name	Name	Last commit message	Last commit date
parent directory ..
.github/workflows	.github/workflows
owid/catalog	owid/catalog
tests	tests
.git-blame-ignore-revs	.git-blame-ignore-revs
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
CLAUDE.md	CLAUDE.md
LICENSE	LICENSE
Makefile	Makefile
README.md	README.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

owid-catalog

A Pythonic library for working with OWID data.

The owid-catalog library is the foundation of Our World in Data's data management system. It provides:

Data APIs: Access OWID's published data through unified client interfaces
Data Structures: Enhanced pandas DataFrames with rich metadata support

Installation

pip install owid-catalog

Quick Examples

Accessing OWID Data

from owid.catalog import fetch, search

# Search for charts (default)
charts = search("population")
tb = charts[0].fetch()

# Fetch data from OWID Chart at ourworldindata.org/grapher/life-expectancy
tb = fetch("life-expectancy")

# Search for tables
tables = search("population", kind="table", namespace="un")
tb = tables[0].fetch()

# Search indicators (using semantic search)
search("renewable energy", kind="indicator")

Working with Data Structures

from owid.catalog import Table
from owid.catalog import processing as pr

# Tables are pandas DataFrames with metadata
tb = Table(df, metadata={"short_name": "population"})

# Metadata propagates through operations
tb_filtered = tb[tb["year"] > 2000]  # Keeps metadata
tb_merged = pr.merge(tb1, tb2, on="country")  # Merges metadata

Documentation

For detailed documentation, see:

API Reference: ChartsAPI, IndicatorsAPI, TablesAPI
Data Structures: Dataset, Table, Variable, metadata handling
Full Documentation: Complete library documentation

Architecture

graph TB
etl -->|reads| snapshot[upstream datasets]
etl -->|generates| s3[data catalog]
catalog[owid-catalog] -->|queries| s3

This library is part of OWID's ETL project, which contains recipes for all datasets we publish.

Development

You need Python 3.10+, uv and make installed. Clone the repo, then you can simply run:

# run all unit tests and CI checks
make test

# watch for changes, then run all checks
make watch

Changelog

`v1.0.1`

ResponseSet ergonomics
- Remove deprecated ResponseSet.results property (use .items instead)
- Add .to_dict() method for serializing results to plain dicts (useful for AI/LLM context windows)
- Add all_fields parameter to .to_frame() to temporarily override display mode without mutating instance state

`v1.0.0`

New unified Client API
- owid.catalog.Client as single entry point with ChartsAPI, IndicatorsAPI, TablesAPI
- Quick access via search() and fetch() convenience functions
- Rich result types: ChartResult, IndicatorResult, TableResult with ResponseSet container
Charts API
- Fetch chart data by slug, URL, or slug with query params
- Parse chart slugs from grapher/explorer URLs via parse_chart_slug()
- Explorer best-effort fetching with graceful error handling
- set_ui_advanced() / set_ui_basic() for display configuration
Tables API
- Search catalog by table, namespace, version, dataset, and channel
- Fetch tables directly by catalog path
- Embedded catalog index with local caching
Indicators API
- Semantic search via search.owid.io vector embeddings
- Sort by relevance (similarity + popularity blend) or similarity only
- fetch() for single-column indicator or fetch_table() for the full table
Search & discovery
- Fuzzy, exact, contains, and regex matching modes
- .latest() filtering to keep only newest versions
- Popularity scores (0.0-1.0) from analytics views, results sorted by popularity
- refresh_index parameter to force catalog index reload
Data structures integration
- All fetch() methods return owid.catalog.Table with full metadata
- CatalogPath helper for parsing catalog paths
- Lazy loading with load_data=False for deferred data access
Library reorganization
- Restructured into owid.catalog.core (data structures) and owid.catalog.api (remote access)
- catalog.find() deprecated in favor of Client().tables.search() (backwards compat maintained)
- Legacy code moved to owid.catalog.api.legacy
- New dependencies: pydantic v2.0+
Private data support
- Private datasets served from separate R2 bucket
- API can fetch private data from private bucket
Performance
- Vectorized operations replacing iterrows() in TablesAPI
- Embedded catalog index loading (removed ETLCatalog dependency)
- Modularized search into helper methods
Other
- Thumbnail display in ResponseSet for chart results
- JSON output format support
- Comprehensive exception handling: ChartNotFoundError, LicenseError
- API URLs immutable with Pydantic Field(frozen=True)

See previous versions

`v0.4.5`

Allow both table and dataset parameters in find() (they can now be used together)
Migrate from pyright to ty type checker for improved type checking

`v0.4.4`

Enhanced find() with better search capabilities:
- Case-insensitive search by default (use case=True for case-sensitive)
- Regex support enabled by default for table and dataset parameters
- New fuzzy search with fuzzy=True - typo-tolerant matching sorted by relevance
- Configurable fuzzy threshold (0-100) to control match strictness
New dependency: rapidfuzz for fuzzy string matching

`v0.4.3`

Fixed minor bugs

`v0.4.0`

Highlights
- Support for Python 3.10-3.13 (was 3.11-3.13)
- Drop support for Python 3.9 (breaking change)
Others
- Deprecate Walden.
- Dependencies: Change rdata for pyreadr.
- Support: indicator dimensions.
- Support: MDIMs.
- Switched from Poetry to UV package manager.
- New decorator @keep_metadata to propagate metadata in pandas functions.
Fixes: Table.apply, groupby.apply, metadata propagation, type hinting, etc.

`v0.3.11`

Add support for Python 3.12 in pypackage.toml

`v0.3.10`

Add experimental chart data API in owid.catalog.charts

`v0.3.9`

Switch from isort & black & fake8 to ruff

`v0.3.8`

Pin dataclasses-json==0.5.8 to fix error with python3.9

`v0.3.7`

Fix bugs.
Improve metadata propagation.
Improve metadata YAML file handling, to have common definitions.
Remove DatasetMeta.origins.

`v0.3.6`

Fixed tons of bugs
processing.py module with pandas-like functions that propagate metadata
Support for Dynamic YAML files
Support for R2 alongside S3

`v0.3.5`

Remove catalog.frames; use owid-repack package instead
Relax dependency constraints
Add optional channel argument to DatasetMeta
Stop supporting metadata in Parquet format, load JSON sidecar instead
Fix errors when creating new Table columns

`v0.3.4`

Bump pyarrow dependency to enable Python 3.11 support

`v0.3.3`

Add more arguments to Table.__init__ that are often used in ETL
Add Dataset.update_metadata function for updating metadata from YAML file
Python 3.11 support via update of pyarrow dependency

`v0.3.2`

Fix a bug in Catalog.__getitem__()
Replace mypy type checker by pyright

`v0.3.1`

Sort imports with isort
Change black line length to 120
Add grapher channel
Support path-based indexing into catalogs

`v0.3.0`

Update OWID_CATALOG_VERSION to 3
Support multiple formats per table
Support reading and writing parquet files with embedded metadata
Optional repack argument when adding tables to dataset
Underscore |
Get version field from DatasetMeta init
Resolve collisions of underscore_table function
Convert version to str and load json dimensions

`v0.2.9`

Allow multiple channels in catalog.find function

`v0.2.8`

Update OWID_CATALOG_VERSION to 2

`v0.2.7`

Split datasets into channels (garden, meadow, open_numbers, ...) and make garden default one
Add .find_latest method to Catalog

`v0.2.6`

Add flag is_public for public/private datasets
Enforce snake_case for table, dataset and variable short names
Add fields published_by and published_at to Source
- Added a list of supported and unsupported operations on columns
- Updated pyarrow

`v0.2.5`

Fix ability to load remote CSV tables

`v0.2.4`

Update the default catalog URL to use a CDN

`v0.2.3`

Fix methods for finding and loading data from a LocalCatalog

`v0.2.2`

Repack frames to compact dtypes on Table.to_feather()

`v0.2.1`

Fix key typo used in version check

`v0.2.0`

Copy dataset metadata into tables, to make tables more traceable
Add API versioning, and a requirement to update if your version of this library is too old

`v0.1.1`

Add support for Python 3.8

`v0.1.0`

Initial release, including searching and fetching data from a remote catalog

Uh oh!

FilesExpand file tree

catalog

Directory actions

More options

Directory actions

More options

Latest commit

History

catalog

Folders and files

parent directory

README.md

owid-catalog

Installation

Quick Examples

Accessing OWID Data

Working with Data Structures

Documentation

Architecture

Development

Changelog

v1.0.1

v1.0.0

v0.4.5

v0.4.4

v0.4.3

v0.4.0

v0.3.11

v0.3.10

v0.3.9

v0.3.8

v0.3.7

v0.3.6

v0.3.5

v0.3.4

v0.3.3

v0.3.2

v0.3.1

v0.3.0

v0.2.9

v0.2.8

v0.2.7

v0.2.6

v0.2.5

v0.2.4

v0.2.3

v0.2.2

v0.2.1

v0.2.0

v0.1.1

v0.1.0