Skip to content

Reindexing is getting stalled #30513

@dainiusjocas

Description

@dainiusjocas

Describe the bug
When a reindexing process is triggered and one or more of the synthetic fields in the indexing scripts invokes embed the progress seem to be stalled.

To Reproduce
Steps to reproduce the behavior:

  1. On an existing index with a field like:
field chunks type array<string> {}
  1. Add a synthetic field:
field colbert type tensor<int8>(context{}, token{}, v[16]) {
   indexing: input chunks | embed colbert context | attribute
}
  1. Trigger reindexing.
  2. After some initial progress (see screenshot below) the reindexing progress has stopped.
  3. Sometime the reindexing fails with status
{
  "enabled": true,
  "clusters": {
    "realm": {
      "pending": {},
      "ready": {
        "realm": {
          "readyMillis": 1709661753702,
          "speed": 1.0,
          "cause": "reindexing for an unknown reason",
          "startedMillis": 1709664660006,
          "endedMillis": 1709701153065,
          "message": "PROCESSING_FAILURE: ReturnCode(PROCESSING_FAILURE, [from content node 1] Time is up.)",
          "progress": 0.0,
          "state": "failed"
        }
      }
    }
  }
}

Expected behavior
I understand that inference on CPU takes time and embedding arrays of strings is not the best of ideas.
It would be great to have mo control over reindexing:

  • set a timeout for the document.
  • Reindexing is visiting so it would be great to select only a subset of documents to be processed at the cost of undefined ranking.

Also, more visibility into progress would be nice. Maybe a count of documents reindexed so far.
Furthermore, if somehow recalculating embeddings on synthetic fields could be skipped by checking hashes or something that also would be great.

Screenshots
Added dashboard.
Screenshot 2024-03-08 at 09 38 35

Environment (please complete the following information):

  • Google Cloud, GKE, deployed with custom Helm charts.

Vespa version
8.307.19

Additional context
Slack thread.
An interesting discovery: when persearch was reduced from being equal to the amount of CPU cores available to 1, the reindexing started progressing.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions