A Pythonic library for working with OWID data.
The owid-catalog library is the foundation of Our World in Data's data management system. It provides:
- Data APIs: Access OWID's published data through unified client interfaces
- Data Structures: Enhanced pandas DataFrames with rich metadata support
pip install owid-catalogfrom owid.catalog import fetch, search
# Search for charts (default)
charts = search("population")
tb = charts[0].fetch()
# Fetch data from OWID Chart at ourworldindata.org/grapher/life-expectancy
tb = fetch("life-expectancy")
# Search for tables
tables = search("population", kind="table", namespace="un")
tb = tables[0].fetch()
# Search indicators (using semantic search)
search("renewable energy", kind="indicator")from owid.catalog import Table
from owid.catalog import processing as pr
# Tables are pandas DataFrames with metadata
tb = Table(df, metadata={"short_name": "population"})
# Metadata propagates through operations
tb_filtered = tb[tb["year"] > 2000] # Keeps metadata
tb_merged = pr.merge(tb1, tb2, on="country") # Merges metadataFor detailed documentation, see:
- API Reference: ChartsAPI, IndicatorsAPI, TablesAPI
- Data Structures: Dataset, Table, Variable, metadata handling
- Full Documentation: Complete library documentation
graph TB
etl -->|reads| snapshot[upstream datasets]
etl -->|generates| s3[data catalog]
catalog[owid-catalog] -->|queries| s3
This library is part of OWID's ETL project, which contains recipes for all datasets we publish.
You need Python 3.10+, uv and make installed. Clone the repo, then you can simply run:
# run all unit tests and CI checks
make test
# watch for changes, then run all checks
make watch
- ResponseSet ergonomics
- Remove deprecated
ResponseSet.resultsproperty (use.itemsinstead) - Add
.to_dict()method for serializing results to plain dicts (useful for AI/LLM context windows) - Add
all_fieldsparameter to.to_frame()to temporarily override display mode without mutating instance state
- Remove deprecated
- New unified Client API
owid.catalog.Clientas single entry point withChartsAPI,IndicatorsAPI,TablesAPI- Quick access via
search()andfetch()convenience functions - Rich result types:
ChartResult,IndicatorResult,TableResultwithResponseSetcontainer
- Charts API
- Fetch chart data by slug, URL, or slug with query params
- Parse chart slugs from grapher/explorer URLs via
parse_chart_slug() - Explorer best-effort fetching with graceful error handling
set_ui_advanced()/set_ui_basic()for display configuration
- Tables API
- Search catalog by table, namespace, version, dataset, and channel
- Fetch tables directly by catalog path
- Embedded catalog index with local caching
- Indicators API
- Semantic search via
search.owid.iovector embeddings - Sort by relevance (similarity + popularity blend) or similarity only
fetch()for single-column indicator orfetch_table()for the full table
- Semantic search via
- Search & discovery
- Fuzzy, exact, contains, and regex matching modes
.latest()filtering to keep only newest versions- Popularity scores (0.0-1.0) from analytics views, results sorted by popularity
refresh_indexparameter to force catalog index reload
- Data structures integration
- All
fetch()methods returnowid.catalog.Tablewith full metadata CatalogPathhelper for parsing catalog paths- Lazy loading with
load_data=Falsefor deferred data access
- All
- Library reorganization
- Restructured into
owid.catalog.core(data structures) andowid.catalog.api(remote access) catalog.find()deprecated in favor ofClient().tables.search()(backwards compat maintained)- Legacy code moved to
owid.catalog.api.legacy - New dependencies:
pydanticv2.0+
- Restructured into
- Private data support
- Private datasets served from separate R2 bucket
- API can fetch private data from private bucket
- Performance
- Vectorized operations replacing
iterrows()in TablesAPI - Embedded catalog index loading (removed ETLCatalog dependency)
- Modularized search into helper methods
- Vectorized operations replacing
- Other
- Thumbnail display in
ResponseSetfor chart results - JSON output format support
- Comprehensive exception handling:
ChartNotFoundError,LicenseError - API URLs immutable with Pydantic
Field(frozen=True)
- Thumbnail display in
See previous versions
- Allow both
tableanddatasetparameters infind()(they can now be used together) - Migrate from pyright to ty type checker for improved type checking
- Enhanced
find()with better search capabilities:- Case-insensitive search by default (use
case=Truefor case-sensitive) - Regex support enabled by default for
tableanddatasetparameters - New fuzzy search with
fuzzy=True- typo-tolerant matching sorted by relevance - Configurable fuzzy threshold (0-100) to control match strictness
- Case-insensitive search by default (use
- New dependency:
rapidfuzzfor fuzzy string matching
- Fixed minor bugs
- Highlights
- Support for Python 3.10-3.13 (was 3.11-3.13)
- Drop support for Python 3.9 (breaking change)
- Others
- Deprecate Walden.
- Dependencies: Change
rdataforpyreadr. - Support: indicator dimensions.
- Support: MDIMs.
- Switched from Poetry to UV package manager.
- New decorator
@keep_metadatato propagate metadata in pandas functions.
- Fixes:
Table.apply,groupby.apply, metadata propagation, type hinting, etc.
- Add support for Python 3.12 in
pypackage.toml
- Add experimental chart data API in
owid.catalog.charts
- Switch from isort & black & fake8 to ruff
- Pin dataclasses-json==0.5.8 to fix error with python3.9
- Fix bugs.
- Improve metadata propagation.
- Improve metadata YAML file handling, to have common definitions.
- Remove
DatasetMeta.origins.
- Fixed tons of bugs
processing.pymodule with pandas-like functions that propagate metadata- Support for Dynamic YAML files
- Support for R2 alongside S3
- Remove
catalog.frames; useowid-repackpackage instead - Relax dependency constraints
- Add optional
channelargument toDatasetMeta - Stop supporting metadata in Parquet format, load JSON sidecar instead
- Fix errors when creating new Table columns
- Bump
pyarrowdependency to enable Python 3.11 support
- Add more arguments to
Table.__init__that are often used in ETL - Add
Dataset.update_metadatafunction for updating metadata from YAML file - Python 3.11 support via update of
pyarrowdependency
- Fix a bug in
Catalog.__getitem__() - Replace
mypytype checker bypyright
- Sort imports with
isort - Change black line length to 120
- Add
grapherchannel - Support path-based indexing into catalogs
- Update
OWID_CATALOG_VERSIONto 3 - Support multiple formats per table
- Support reading and writing
parquetfiles with embedded metadata - Optional
repackargument when adding tables to dataset - Underscore
| - Get
versionfield fromDatasetMetainit - Resolve collisions of
underscore_tablefunction - Convert
versiontostrand load jsondimensions
- Allow multiple channels in
catalog.findfunction
- Update
OWID_CATALOG_VERSIONto 2
- Split datasets into channels (
garden,meadow,open_numbers, ...) and make garden default one - Add
.find_latestmethod to Catalog
- Add flag
is_publicfor public/private datasets - Enforce snake_case for table, dataset and variable short names
- Add fields
published_byandpublished_atto Source- Added a list of supported and unsupported operations on columns
- Updated
pyarrow
- Fix ability to load remote CSV tables
- Update the default catalog URL to use a CDN
- Fix methods for finding and loading data from a
LocalCatalog
- Repack frames to compact dtypes on
Table.to_feather()
- Fix key typo used in version check
- Copy dataset metadata into tables, to make tables more traceable
- Add API versioning, and a requirement to update if your version of this library is too old
- Add support for Python 3.8
- Initial release, including searching and fetching data from a remote catalog