A Go service that tracks ~75,000 Maven Central namespaces (2-8 levels deep) to measure how AI coding tools affect library creation, quality, and community growth. 10 interactive charts with hierarchical drill-down and filter toggles. Data from repo1.maven.org, deps.dev, OSV, Sonatype Central Portal, and GitHub.
Read the insights — analysis of Maven Central trends in the age of AI.
All charts use ECharts with a dark theme, AI tool milestone annotations (Copilot GA, GPT-4, Claude releases, Cursor, etc.), and click-to-detail drill-down panels.
| # | Chart | Route | What it shows |
|---|---|---|---|
| 1 | New Groups Per Month | /new-groups-per-month |
New namespaces each month with one-and-done overlay. Toggle: All / Truly New / Extensions. Hierarchical drill-down |
| 2 | New Groups By Prefix | /publishes-per-month |
Stacked area breaking down new groups by top-level prefix (com, org, io, dev, ai, etc.). Top 10 + Other |
| 3 | License Trends | /license-trends |
License distribution per month. Top 8 licenses stacked, click any segment to filter |
| 4 | Artifacts Per Month | /artifact-trends |
New artifacts across all groups with a linear trend line. Outliers capped at 500 |
| 5 | Versions Per Month | /version-trends |
Version counts from deps.dev enrichment with trend line. Outliers capped at 500 |
| 6 | Known CVEs | /cve-trends |
Groups with vulnerabilities by severity (Critical/High/Moderate/Low) + % affected trend |
| 7 | Source Repo Presence | /source-repos |
% of groups linking to a source repo over time, with trend line |
| 8 | Popularity Distribution | /popularity |
Log-scale histogram of dependent counts + top 25 most-depended-on groups |
| 9 | Group Size Distribution | /size-distribution |
Artifact count buckets (1 / 2-5 / 6-50 / 50+) stacked per month |
| 10 | Solo vs Team Contributors | /contributors |
Single-committer vs team projects over time, new-to-GitHub authors. Requires GITHUB_TOKEN |
| Endpoint | Description |
|---|---|
GET /api/new-groups |
New groups per month |
GET /api/new-groups/details?month=2025-07 |
Groups created in a specific month |
GET /api/scan-progress |
Live scan and enrichment progress |
GET /api/groups-by-prefix |
New groups by prefix per month |
GET /api/license-trends |
License distribution per month |
GET /api/one-and-done |
Single-version vs multi-version per month |
GET /api/growth |
New artifacts per month |
GET /api/version-trends |
Version counts per month |
GET /api/cve-trends |
CVE stats per month |
GET /api/source-repos |
Source repo presence per month |
GET /api/popularity |
Popularity distribution + top groups |
GET /api/size-distribution |
Group size distribution by month |
GET /api/contributors |
Solo vs team contributor stats per month |
GET /api/group-popularity?namespace=com.foo |
On-demand popularity for a namespace |
GET /api/new-artifacts-today |
New artifacts today (Solr) |
GET /health |
Health check |
brew install go
# Run directly
make run
# Or build and run in background
make build
nohup ./bin/server > server.log 2>&1 &
# With GitHub contributor enrichment (optional)
GITHUB_TOKEN=$(gh auth token) ./bin/serverThe server starts on :8080 (override with PORT). Set LOG_LEVEL=debug for verbose logging.
On first run, the server discovers ~75,000 Maven Central namespaces through a two-phase scan:
- Initial scan: Enumerates 26 top-level prefixes on repo1, discovering two-level groups. Pure namespace groups (like
org.apache) are registered even without direct artifacts - Deep scan: Recurses into each group to find 3-8 level namespaces (e.g.
com.fasterxml.jackson.core), looping until convergence. Skips entries with hyphens (artifact names, not namespace components). Completion is persisted so restarts skip it
Subsequent runs resume from stored data. Run make export to generate a static site for GitHub Pages.
make test # Go unit tests
go test ./... -cover # with coverage
npx playwright test # E2E tests (server must be running)
npx playwright test --headed # with visible browserCoverage: config 100%, middleware 100%, store 84%, handler 19% (handler includes background goroutines that call external APIs).
SQLite database at data/maven.db (WAL mode, single connection).
| Table | Purpose |
|---|---|
groups |
~75,000 Maven Central namespaces with enrichment data (license, CVEs, contributors, popularity) |
monthly_activity |
Version publish counts by actual publish month (from deps.dev) |
contributors |
Per-group contributor list from GitHub |
scan_progress |
Completed prefix scans for resume support |
Two goroutines start on boot:
- Group scan — Enumerate 26 prefixes on repo1, discover groups, call deps.dev for first-publish dates
- Enrichment pipeline — Runs sequentially after scan completes:
- Deep scan: Discover 3+ level groupIds by recursing into existing groups
- deps.dev detail: Fetch license, source repo, total version count (100ms throttle)
- OSV CVEs: Batch-query api.osv.dev for ALL artifacts per group (100 per batch request)
- Central Portal: Fetch popularity metrics (3s throttle, 60s backoff on 429)
- GitHub: Fetch contributor lists and account ages (1.2s throttle, requires
GITHUB_TOKEN)
| Source | What we get | Auth | Rate limits |
|---|---|---|---|
| repo1.maven.org | Namespace/artifact enumeration | None | None observed |
| api.deps.dev | Version history, licenses, source repos | None | None observed |
| api.osv.dev | CVE counts and severity | None | None observed |
| central.sonatype.com | Dependent counts, app counts | None | 429 after ~100 reqs |
| api.github.com | Contributors, account ages | GITHUB_TOKEN |
5,000/hr |
Maven Central organises packages as groupId:artifactId where groupId uses reverse-domain notation (e.g. com.google.cloud). repo1 exposes this as a directory tree: com/google/cloud/.
The scanner:
- Lists 26 top-level prefixes (
ai,com,dev,io,org, etc.) - For each prefix, lists subdirectories to get two-level groups (
com.google,org.apache) - Deep scan: For each two-level group, checks children — if a child directory contains more directories (not version numbers), it's a deeper namespace and gets registered (e.g.
com.google.cloud,org.apache.commons) - For each group, calls deps.dev to find the first artifact's earliest publish date
- Heuristic: entries containing hyphens are skipped during deep scan (artifact names like
spring-boot-starter, not namespace components likeboot)
The current month is excluded from all charts unless it's the 30th or 31st, to avoid misleadingly low bars for incomplete months.
Export a fully interactive static site with baked-in data:
make export # generates docs/site/All 10 charts work offline — API data is pre-exported as JSON files. Deploy to GitHub Pages via the Deploy to GitHub Pages workflow (Actions > Run workflow), or serve locally:
cd docs/site && python3 -m http.server 8081| Source | Verdict |
|---|---|
Solr Search API (search.maven.org) |
Index stale for most groups after June 2025, aggressive rate limiting |
| Central Portal Components | Latest version date only, can't query historical ranges |
| Central Portal Versions | Sort by publishedDate is broken (random order) |
| mvnrepository.com | HTTP 403 for all API calls |
| libraries.io | Maven data quality too low, requires API key |
| deps.dev + repo1 | Chosen approach — repo1 for enumeration, deps.dev for timestamps |

