Describe the bug
Link to attempt at minimal example
When a news document holds a reference<global_schema> field that points to a non-existent global document, the imported multi-dimensional sparse tensor field resolves to an untyped tensor() at ranking time instead of the declared type. A rank profile that uses this field in a match-feature expression then fails with a type-mismatch error, causing the entire result set to return 0 hits.
Conditions required to trigger the bug
All four of the following must be present simultaneously:
- Dangling parent reference — a child document's reference field points to a parent doc ID that does not exist in the global schema.
- Multi-dimensional imported tensor — the imported field is a sparse mapped tensor with two or more dimensions, e.g.
tensor<float>(dim_a{},dim_b{}). A single-dimension tensor silently returns a typed empty tensor and does not trigger the error.
- Soft timeout fires during ranking —
ranking.softtimeout.enable: true with a factor tight enough that some content nodes are cut off before finishing. In production this is caused by ANN (dense retrieval) latency on loaded nodes.
- No docsum retry —
dispatch.docsumRetryLimit: 0 (or equivalent) means there is no retry on timed-out nodes, so incomplete match-feature data reaches the container layer.
Error
{
"errors": [{
"code": 4,
"summary": "Invalid query parameter",
"message": "'attribute(my_global_tensor)' must be of type tensor<float>(dim_a{},dim_b{}), not tensor()"
}]
}
To Reproduce
Note: I was unable to reproduce this locally — the local two-node Docker setup with CPU throttling does not replicate the production timing conditions precisely enough. The reproduction below gets as close as possible by using the same schema, document structure, and query parameters.
1. Start Vespa (two-node Docker Compose, pinned to the affected version):
docker compose -f app-7-parent-child/docker-compose.yml up -d
Wait ~60 s, then check curl http://localhost:19071/status.html.
2. Build and deploy the application package:
mvn clean package -DskipTests -f app-7-parent-child/pom.xml
curl -X POST http://localhost:19071/application/v2/tenant/default/prepareandactivate \
--data-binary @app-7-parent-child/target/application.zip \
-H "Content-Type: application/zip"
3. Feed and query:
python reproduce_bug.py --feed # feed once
python reproduce_bug.py # re-run query
The script feeds one global document and 400 news documents (200 with a valid parent reference, 200 with a dangling reference), then queries with nearestNeighbor + soft timeout to mirror production conditions.
Expected behavior
Documents with a dangling parent reference should either return a zero-valued typed tensor for the imported field (so the match-feature evaluates to 0.0 without error), or be excluded from results with a warning. The type mismatch should not propagate as a query-level error and should not discard the entire result set.
Environment
- Vespa version: 8.653.22 (confirmed from trace — see
bug_trace_anonymized.json)
- Infrastructure: self-hosted production cluster (Linux)
- Reproduction attempt: macOS, Docker (
vespaengine/vespa:8.653.22), two content nodes
Trace
A sanitized production trace is attached as bug_trace_anonymized.json. Key observations:
- Query config:
timeout: 0.45s, softtimeout.factor: 0.9 (soft deadline ~405 ms)
- 12 content nodes timed out: distribution keys
3, 9, 15, 33, 57, 101, 103, 104, 106, 110, 125, 127
- A representative timed-out node (dist key 128 equivalent) spent 406 ms before returning 0 hits — right at the soft-timeout threshold
- After dispatch, hits were marked unfillable and the container raised the
Invalid query parameter error
The timeout is a symptom, not the root cause. The dangling reference causes attribute(my_global_tensor) to resolve to tensor() (untyped), which stalls match-feature computation on the affected nodes, which causes the soft timeout to fire on those nodes, which surfaces the type error at the container layer.
Reproduction files
| File |
Purpose |
reproduce_bug.py |
Feed + query script |
app-7-parent-child/docker-compose.yml |
Two-node Vespa setup (v8.653.22) |
app-7-parent-child/src/main/application/schemas/news_source_ctr.sd |
Global schema with 2D tensor |
app-7-parent-child/src/main/application/schemas/news.sd |
Child schema with import + rank profile |
app-7-parent-child/feed/feed_news_source_ctr.json |
Global parent document |
bug_trace_anonymized.json |
Sanitized production trace |
Describe the bug
Link to attempt at minimal example
When a
newsdocument holds areference<global_schema>field that points to a non-existent global document, the imported multi-dimensional sparse tensor field resolves to an untypedtensor()at ranking time instead of the declared type. A rank profile that uses this field in a match-feature expression then fails with a type-mismatch error, causing the entire result set to return 0 hits.Conditions required to trigger the bug
All four of the following must be present simultaneously:
tensor<float>(dim_a{},dim_b{}). A single-dimension tensor silently returns a typed empty tensor and does not trigger the error.ranking.softtimeout.enable: truewith a factor tight enough that some content nodes are cut off before finishing. In production this is caused by ANN (dense retrieval) latency on loaded nodes.dispatch.docsumRetryLimit: 0(or equivalent) means there is no retry on timed-out nodes, so incomplete match-feature data reaches the container layer.Error
{ "errors": [{ "code": 4, "summary": "Invalid query parameter", "message": "'attribute(my_global_tensor)' must be of type tensor<float>(dim_a{},dim_b{}), not tensor()" }] }To Reproduce
1. Start Vespa (two-node Docker Compose, pinned to the affected version):
Wait ~60 s, then check
curl http://localhost:19071/status.html.2. Build and deploy the application package:
mvn clean package -DskipTests -f app-7-parent-child/pom.xml curl -X POST http://localhost:19071/application/v2/tenant/default/prepareandactivate \ --data-binary @app-7-parent-child/target/application.zip \ -H "Content-Type: application/zip"3. Feed and query:
The script feeds one global document and 400 news documents (200 with a valid parent reference, 200 with a dangling reference), then queries with
nearestNeighbor+ soft timeout to mirror production conditions.Expected behavior
Documents with a dangling parent reference should either return a zero-valued typed tensor for the imported field (so the match-feature evaluates to
0.0without error), or be excluded from results with a warning. The type mismatch should not propagate as a query-level error and should not discard the entire result set.Environment
bug_trace_anonymized.json)vespaengine/vespa:8.653.22), two content nodesTrace
A sanitized production trace is attached as
bug_trace_anonymized.json. Key observations:timeout: 0.45s,softtimeout.factor: 0.9(soft deadline ~405 ms)3, 9, 15, 33, 57, 101, 103, 104, 106, 110, 125, 127Invalid query parametererrorThe timeout is a symptom, not the root cause. The dangling reference causes
attribute(my_global_tensor)to resolve totensor()(untyped), which stalls match-feature computation on the affected nodes, which causes the soft timeout to fire on those nodes, which surfaces the type error at the container layer.Reproduction files
reproduce_bug.pyapp-7-parent-child/docker-compose.ymlapp-7-parent-child/src/main/application/schemas/news_source_ctr.sdapp-7-parent-child/src/main/application/schemas/news.sdapp-7-parent-child/feed/feed_news_source_ctr.jsonbug_trace_anonymized.json