Skip to content

Optional partial indexes to make ?unspent matching fast on non-pruned instances #206

@Javieracost

Description

@Javieracost

What is your idea? Provide a use case.

Add optional partial indexes … WHERE spent_at IS NULL, mirroring the existing match indexes so GET /matches/{pattern}?unspent is index-backed on non-pruned instances. Opt-in (e.g. an --index-unspent flag), off by default.

Use case: we run a non-pruned Kupo that serves two needs at once:

  • Historical input resolution: resolving already-spent UTxOs by tx/slot, which needs the full spent history retained.
  • A live "current UTxO set" view: ?unspent queries over specific scripts/credentials.

The second is fast for low-churn patterns but pathological for busy ones — e.g. the unspent UTxOs at a busy script's payment credential, where that credential has ~4,000,000 rows total but only ~10,000 currently unspent.

Why is it a good idea?

It gives prune-like speed for current-UTxO queries while keeping full history — otherwise mutually exclusive (see the alternatives below).

Measured on mainnet (~325 GB, ~350M rows). EXPLAIN QUERY PLAN for WHERE payment_credential = ? AND spent_at IS NULL today:

SEARCH inputs USING INDEX inputsByPaymentCredential (payment_credential=?)

It seeks all ~4M rows at the credential and filters spent_at IS NULL row-by-row → ~52 GiB read, ~170 s (the credential-wildcard form also tends to drop the connection). spent_at IS NULL matches the whole current UTxO set.

With a partial index:

CREATE INDEX inputsByPaymentCredentialUnspent
  ON inputs(payment_credential COLLATE NOCASE) WHERE spent_at IS NULL;

(the COLLATE NOCASE has to match the column collation or the planner won't choose it), the same query becomes a direct seek to only the unspent rows:

SEARCH inputs USING INDEX inputsByPaymentCredentialUnspent (payment_credential=?)

→ ~170 s → ~1.3 s for us.

The cost is modest and opt-in:

  • Disk proportional to the unspent set, not history: ~1 GB for us (≈11 M unspent rows) vs the 325 GB table.
  • Write path: one index insert on UTxO creation, one delete on spend — the same class of churn Kupo already pays maintaining inputsBySpentAt, just one more index.
  • Off by default, so the "non-pruned + heavy ?unspent" niche pays for it and nobody else does.

What is the current alternative and why is it not good enough?

  • --prune-utxo makes ?unspent trivially fast, but only by discarding the spent set, which our historical-resolution need depends on. The two are mutually exclusive, so it isn't an option for users of both use cases.
  • Creating the index out-of-band on the SQLite file (what we do now) works, but it lives outside Kupo's migrations: lost on a resync and liable to collide with a future schema change. Hence asking for native support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions