| name | huggingface-datasets |
|---|---|
| description | Use this skill for Hugging Face Dataset Viewer API workflows that fetch subset/split metadata, paginate rows, search text, apply filters, download parquet URLs, and read size or statistics. |
Use this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.
- Optionally validate dataset availability with
/is-valid. - Resolve
config+splitwith/splits. - Preview with
/first-rows. - Paginate content with
/rowsusingoffsetandlength(max 100). - Use
/searchfor text matching and/filterfor row predicates. - Retrieve parquet links via
/parquetand totals/metadata via/sizeand/statistics.
- Base URL:
https://datasets-server.huggingface.co - Default API method:
GET - Query params should be URL-encoded.
offsetis 0-based.lengthmax is usually100for row-like endpoints.- Gated/private datasets require
Authorization: Bearer <HF_TOKEN>.
Validate dataset:/is-valid?dataset=<namespace/repo>List subsets and splits:/splits?dataset=<namespace/repo>Preview first rows:/first-rows?dataset=<namespace/repo>&config=<config>&split=<split>Paginate rows:/rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int>Search text:/search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int>Filter with predicates:/filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int>List parquet shards:/parquet?dataset=<namespace/repo>Get size totals:/size?dataset=<namespace/repo>Get column statistics:/statistics?dataset=<namespace/repo>&config=<config>&split=<split>Get Croissant metadata (if available):/croissant?dataset=<namespace/repo>
Pagination pattern:
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100"
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100"When pagination is partial, use response fields such as num_rows_total, num_rows_per_page, and partial to drive continuation logic.
Search/filter notes:
/searchmatches string columns (full-text style behavior is internal to the API)./filterrequires predicate syntax inwhereand optional sort inorderby.- Keep filtering and searches read-only and side-effect free.
For CLI-based parquet URL discovery or SQL, use the hf-cli skill with hf datasets parquet and hf datasets sql.
Use one of these flows depending on dependency constraints.
Zero local dependencies (Hub UI):
- Create dataset repo in browser:
https://huggingface.co/new-dataset - Upload parquet files in the repo "Files and versions" page.
- Verify shards appear in Dataset Viewer:
curl -s "https://datasets-server.huggingface.co/parquet?dataset=<namespace>/<repo>"Low dependency CLI flow (npx @huggingface/hub / hfjs):
- Set auth token:
export HF_TOKEN=<your_hf_token>- Upload parquet folder to a dataset repo (auto-creates repo if missing):
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data- Upload as private repo on creation:
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data --privateAfter upload, call /parquet to discover <config>/<split>/<shard> values for querying with @~parquet.
The Hub supports raw agent session traces from Claude Code, Codex, and Pi Agent. Upload them to Hugging Face Datasets as original JSONL files and the Hub can auto-detect the trace format, tag the dataset as Traces, and enable the trace viewer for browsing sessions, turns, tool calls, and model responses. Common local session directories:
- Claude Code:
~/.claude/projects - Codex:
~/.codex/sessions - Pi:
~/.pi/agent/sessions
Default to private dataset repos because traces can contain prompts, file paths, tool outputs, secrets, or PII. Preserve the raw .jsonl files and nest them by project/cwd instead of uploading every session at the dataset root.
hf repos create <namespace>/<repo> --type dataset --private --exist-ok
hf upload <namespace>/<repo> ~/.codex/sessions codex/<project-or-cwd> --type dataset