From bbe00da2d09ec5090fa979360d7a9622e4940bb7 Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Fri, 13 Mar 2026 14:08:13 +0100 Subject: [PATCH 1/9] Document total-rerank-count --- en/basics/ranking.html | 4 ++-- en/content/attributes.html | 4 ++-- en/learn/tutorials/rag-blueprint.md | 13 +++++------ en/querying/nearest-neighbor-search.md | 2 +- en/ranking/phased-ranking.html | 12 +++++----- en/reference/api/query.html | 26 ++++++++++++++++----- en/reference/querying/yql.html | 4 ++-- en/reference/schemas/schemas.html | 32 ++++++++++++++++++-------- 8 files changed, 61 insertions(+), 36 deletions(-) diff --git a/en/basics/ranking.html b/en/basics/ranking.html index 1b44706226..41f5a56fb7 100644 --- a/en/basics/ranking.html +++ b/en/basics/ranking.html @@ -96,12 +96,12 @@

Phased ranking

second-phase { expression: xgboost(my_xgboost_reranker) - rerank-count: 1000 # per content node + total-rerank-count: 1000 # Over all nodes } global-phase { expression: sum(onnx(my_large_onnx_model)) - rerank-count: 20 # globally + rerank-count: 20 } } diff --git a/en/content/attributes.html b/en/content/attributes.html index baa69121e6..378f911273 100644 --- a/en/content/attributes.html +++ b/en/content/attributes.html @@ -586,7 +586,7 @@

Paged attributes

where the number of attribute accesses are limited by the re-ranking phase count.

- For example using a second phase rerank-count + For example using a second phase total-rerank-count of 100 will limit the maximum number of page-ins/disk access per query to 100. Running at 100 QPS would need up to 10K disk accesses per second. This is the worst case if none of the accessed attribute data were paged into memory already. @@ -608,7 +608,7 @@

Paged attributes

rank-profile foo { first-phase {} second-phase { - rerank-count: 100 + total-rerank-count: 100 expression: sum(attribute(tensordata)) } } diff --git a/en/learn/tutorials/rag-blueprint.md b/en/learn/tutorials/rag-blueprint.md index 60861f1787..7750c5e668 100644 --- a/en/learn/tutorials/rag-blueprint.md +++ b/en/learn/tutorials/rag-blueprint.md @@ -570,8 +570,7 @@ not the case for most real-world RAG applications, so this is cruical to have in ![phased ranking overview](/assets/img/phased-ranking-rag.png) -It is worth noting that parameters such as `targetHits` (for the match phase) and `rerank-count` -(for first and second phase) are applied **per content node**. Also note that the stateless container nodes can +That the stateless container nodes can also be [scaled independently](../../performance/sizing-search.html) to handle increased query load. ## Configuring match-phase (retrieval) @@ -1380,8 +1379,8 @@ We run the evaluation script on a set of unseen test queries, and get the follow ``` For the first phase ranking, we care most about recall, as we just want to make sure that the candidate documents are -ranked high enough to be included in the second-phase ranking. (the default number of documents that will be exposed to -second-phase is 10 000, but can be controlled by the `rerank-count` parameter). +ranked high enough to be included in the second-phase ranking. The number of documents to be reranked in second-phase +in total over all content nodes is controlled by the `total-rerank-count` parameter. We can see that our results are already very good. This is of course due to the fact that we have a small,synthetic dataset. In reality, you should align the metric expectations with your dataset and test queries. @@ -1392,7 +1391,7 @@ within your latency budget, as you want some headroom for second-phase ranking. ## Second-phase ranking For the second-phase ranking, we can afford to use a more expensive ranking expression, since we will only run it -on the top-k documents from the first-phase ranking (defined by the `rerank-count` parameter, which defaults to 10,000 documents). +on the top-k documents from the first-phase ranking (decided by the `total-rerank-count` parameter). This is where we can significantly improve ranking quality by using more sophisticated models and features that would be too expensive to compute for all matched documents. @@ -1589,7 +1588,7 @@ vespa query \ **Performance monitoring:** * Monitor latency impact of second-phase ranking -* Adjust `rerank-count` based on quality vs. performance trade-offs +* Adjust `total-rerank-count` based on quality vs. performance trade-offs * Consider using different models for different query types or use cases The second-phase ranking represents a crucial step in building high-quality RAG applications, @@ -1598,7 +1597,7 @@ providing the precision needed for effective LLM context while maintaining reaso ## (Optional) Global-phase ranking We also have the option of configuring [global-phase](../../reference/schemas/schemas.html#globalphase-rank) ranking, which can rerank the top k -(as set by `rerank-count` parameter) documents from the second-phase ranking. +(as set by `total-rerank-count` parameter) documents from the second-phase ranking. Common options for global-phase are [cross-encoders](../../ranking/cross-encoders.html) or another GBDT model, trained for better separating top ranked documents on objectives such as [LambdaMart](https://xgboost.readthedocs.io/en/latest/tutorials/learning_to_rank.html). For RAG applications, diff --git a/en/querying/nearest-neighbor-search.md b/en/querying/nearest-neighbor-search.md index 5178835275..570ff6ce39 100644 --- a/en/querying/nearest-neighbor-search.md +++ b/en/querying/nearest-neighbor-search.md @@ -275,7 +275,7 @@ rank-profile image_similarity_with_reranking { expression: closeness(field, image_embeddings) } second-phase { - rerank-count: 1000 + total-rerank-count: 1000 expression: closeness(field, image_embeddings) * attribute(popularity) } } diff --git a/en/ranking/phased-ranking.html b/en/ranking/phased-ranking.html index 4779d385bb..7172006207 100644 --- a/en/ranking/phased-ranking.html +++ b/en/ranking/phased-ranking.html @@ -31,8 +31,8 @@
  • second-phase ranking; configured in rank-profile. Optionally re-rank the top-scoring hits from the first-phase ranking using a more complex expression. The - rerank-count sets a strict upper bound on the - number of documents that are re-ranked. + total-rerank-count sets a strict upper bound on the + number of documents that are re-ranked in total over the nodes.
  • Global ranking:Following the per content node local ranking phases, @@ -83,8 +83,8 @@

    Two-phase ranking on content nodes

    By default, second-phase ranking (if specified) is evaluated for the 100 best hits - from the first-phase ranking per content node, tunable with - rerank-count. + from the first-phase ranking per content node. The number that is reranked over all nodes can be set by + total-rerank-count.

     schema myapp {
    @@ -99,7 +99,7 @@ 

    Two-phase ranking on content nodesUsing a global-phase expression

    } rerank-count: 50 } - match-features { + match-features { my_expensive_function } } diff --git a/en/reference/api/query.html b/en/reference/api/query.html index 424565901f..d5364109db 100644 --- a/en/reference/api/query.html +++ b/en/reference/api/query.html @@ -80,7 +80,8 @@

    Parameters

  • ranking.properties [rankproperty]
  • ranking.queryCache
  • ranking.rankScoreDropLimit
  • -
  • ranking.rerankCount
  • +
  • ranking.secondPhase.totalRerankCount
  • +
  • ranking.secondPhase.rerankCount
  • ranking.secondPhase.rankScoreDropLimit
  • ranking.significance.useModel
  • ranking.softtimeout.enable
  • @@ -851,15 +852,28 @@

    Ranking

    - ranking.rerankCount + ranking.secondPhase.totalRerankCount Number -

    - Specifies the number of hits that should be ranked in the second ranking phase. - Overrides the rerank-count set in the rank profile. - Setting to 0 disables the second phase reranking. +

    + Specifies the number of hits that should be ranked in the second ranking phase in total over the queried + content nodes. + Overrides the total-rerank-count set in the rank profile. + Setting to 0 disables second phase reranking. +

    + + + + ranking.secondPhase.rerankCount + + Number + + +

    + Specifies the number of hits that should be ranked in the second phase per node. + Prefer using totalRerankCount over this.

    diff --git a/en/reference/querying/yql.html b/en/reference/querying/yql.html index f21ba50b62..ef96cbbff3 100644 --- a/en/reference/querying/yql.html +++ b/en/reference/querying/yql.html @@ -2196,9 +2196,9 @@

    Annotations

    It sets the wanted number of hits exposed to the real first-phase ranking function per content node. - If additional second phase ranking with rerank-count is used, + If additional second phase ranking is used, do not set targetHits less than the configured rank-profile's - rerank-count. + total-rerank-count.

    diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html index 1fe8647152..e1fb4a0cc5 100644 --- a/en/reference/schemas/schemas.html +++ b/en/reference/schemas/schemas.html @@ -124,6 +124,7 @@

    Elements

    second-phase expression rank-score-drop-limit + total-rerank-count rerank-count global-phase expression @@ -1861,7 +1862,7 @@

    diversity

    Result sets are guaranteed to get at least min-groups unique values from the diversity attribute from this phase, but no more than max-hits. For match-phase max-hits = match-phase max-hits. -For second-phase max-hits = rerank-count +For second-phase max-hits = total-rerank-count A document is considered a candidate if:
    • The query has not yet reached the max-hits @@ -1894,7 +1895,7 @@

      diversity

      Using this with match-phase often means one can reduce max-hits. In second-phase - you might reduce rerank-count and still good and diverse results. + you might reduce total-rerank-count and still get good and diverse results.

      @@ -2233,15 +2234,26 @@

      second-phase

      +total-rerank-count + +

      + Optional argument. Specifies the number of hits to be re-ranked in the second phase in total over the content + nodes that participate in evaluating a query (a group). + The default value is 100 per node. This can also be + set in the query. + Hits not reranked might be re-scored. +

      + + rerank-count - -

      - Optional argument. Specifies the number of hits to be re-ranked in the second phase. - The default value is 100. This can also be set in the query. - Note that this value is local to each node involved in a query. - Hits not reranked might be re-scored. -

      - + +

      + Optional argument. Specifies the number of hits to be re-ranked in the second phase on each content node. + This can also be set in the query. + Prefer using total-rerank-count over this. +

      + + From 081e1448c1ed154382d8d3e4865d7a0e0f0fc4e2 Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Fri, 13 Mar 2026 14:29:24 +0100 Subject: [PATCH 2/9] Document total-keep-rank-count --- en/learn/faq.md | 2 +- en/ranking/ranking-intro.md | 4 ++-- en/reference/api/query.html | 19 +++++++++++++++++-- en/reference/schemas/schemas.html | 10 +++++++++- 4 files changed, 29 insertions(+), 6 deletions(-) diff --git a/en/learn/faq.md b/en/learn/faq.md index b48239d797..9ede0766ef 100644 --- a/en/learn/faq.md +++ b/en/learn/faq.md @@ -87,7 +87,7 @@ of a double. This can happen in two cases: - The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce -Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this. -- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside of what Vespa computed and caches on a heap. This is controlled by the [keep-rank-count](../reference/schemas/schemas.html#keep-rank-count). +- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) perameter. ### How to pin query results? To hard-code documents to positions in the result set, diff --git a/en/ranking/ranking-intro.md b/en/ranking/ranking-intro.md index db6db93ad0..2d1da716eb 100644 --- a/en/ranking/ranking-intro.md +++ b/en/ranking/ranking-intro.md @@ -293,7 +293,7 @@ Let's try the same query again, with a two-phase rank-profile that also does an
       rank-profile inlinks_twophase inherits inlinks_age {
           first-phase {
      -        keep-rank-count       : 50
      +        total-keep-rank-count : 50
               rank-score-drop-limit : 10
               expression            : num_inlinks
           }
      @@ -316,7 +316,7 @@ Here, `num_inlinks` and `rank_score` are defined in a rank profile we used earli
       
       In the results, observe that no document has a _rankingExpression(num_inlinks)_ less than or equal to 10.0,
       meaning all such documents were purged in the first ranking phase due to the `rank-score-drop-limit`.
      -Normally, the `rank-score-drop-limit` is not used, as the `keep-rank-count` is most important.
      +Normally, the `rank-score-drop-limit` is not used, as the `total-keep-rank-count` is most important.
       Read more in the [reference](../reference/schemas/schemas.html#rank-score-drop-limit).
       
       For a dynamic limit, pass a ranking feature like `query(threshold)`
      diff --git a/en/reference/api/query.html b/en/reference/api/query.html
      index d5364109db..6de4ee22e9 100644
      --- a/en/reference/api/query.html
      +++ b/en/reference/api/query.html
      @@ -877,6 +877,19 @@ 

      Ranking

      + + ranking.totalKeepRankCount + + Number + + +

      + Specifies the number of hits for which the rank score should be kept after first phase ranking + in total over the nodes participating in the query. + Overrides the total-keep-rank-count set in the rank profile. +

      + + ranking.keepRankCount @@ -884,8 +897,10 @@

      Ranking

      - Specifies the number of hits that should keep rank value. - Overrides the keep-rank-count set in the rank profile. + Specifies the number of hits for which the rank score should be kept after first phase ranking + on each node. + Overrides the keep-rank-count set in the rank profile. + Prefer total-keep-rank-count over this.

      diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html index e1fb4a0cc5..68b301597a 100644 --- a/en/reference/schemas/schemas.html +++ b/en/reference/schemas/schemas.html @@ -118,6 +118,7 @@

      Elements

      order max-hits first-phase + total-keep-rank-count keep-rank-count rank-score-drop-limit expression @@ -1929,9 +1930,16 @@

      first-phase

      see ranking expressions.

      +total-keep-rank-count + +

      How many documents to keep the first phase top rank values for. + The default value is 10000 per node.

      + + keep-rank-count -

      How many documents to keep the first phase top rank values for. The default value is 10000.

      +

      How many documents to keep the first phase top rank values for. + Prefer total-keep-rank-count over this.

      rank-score-drop-limit From 4c5fa13a1eb0e43ceab8110d801f6c6049ea0119 Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Mon, 16 Mar 2026 13:11:50 +0100 Subject: [PATCH 3/9] Document total-max-hits --- en/performance/graceful-degradation.html | 14 +++++------ en/querying/result-diversity.md | 2 +- en/reference/api/query.html | 30 +++++++++++++++++++----- en/reference/schemas/schemas.html | 23 +++++++++++++----- 4 files changed, 49 insertions(+), 20 deletions(-) diff --git a/en/performance/graceful-degradation.html b/en/performance/graceful-degradation.html index 853a069daa..f0559814cf 100644 --- a/en/performance/graceful-degradation.html +++ b/en/performance/graceful-degradation.html @@ -177,21 +177,21 @@

      Match phase degradation

      Match-phase works by specifying an attribute that measures document quality in some way (popularity, click-through rate, pagerank, ad bid value, price, text quality). -In addition, a max-hits value is specified -that specifies how many hits are "more than enough" for the application. +In addition, a total.max-hits value is specified +that specifies how many hits in total over the content nodes are "more than enough" for the application. Then an estimate is made after collecting a reasonable amount of hits for the query, -and if the estimate is higher than the configured max-hits value, +and if the estimate is higher than the node's share of the total-max-hits value, an extra limitation is added to the query, ensuring that only the highest quality documents can become hits.

      In effect, this limits the documents actually queried to the highest quality documents, a subset of the full corpus, where the size of subset is calculated in such a way -that the query is estimated to give max-hits hits. +that the query is estimated to give the node's share of total-max-hits hits. Since some (low-quality) hits will already have been collected to do the estimation, -the actual number of hits returned will usually be higher than max-hits. +the actual number of hits returned will usually be higher than total-max-hits. But since the distribution of documents isn't perfectly smooth, -you risk sometimes getting less than the configured max-hits hits back. +you risk sometimes getting less than the configured total-max-hits hits back.

      Note that limiting hits in the match-phase also affects aggregation/grouping, and total-hit-count since it actually limits, so the query gets fewer hits. @@ -200,7 +200,7 @@

      Match phase degradation

      since they both operate in the same manner, and you would get interference between them that could cause unpredictable results. The graph shows possible hits versus actual hits in a corpus with 100 000 documents, -where max-hits is configured to 10 000. +where total-max-hits is configured to 10 000 per node. The corpus is a synthetic (slightly randomized) data set, in practice the graph will be less smooth:

      diff --git a/en/querying/result-diversity.md b/en/querying/result-diversity.md index 4f43124205..209608f722 100644 --- a/en/querying/result-diversity.md +++ b/en/querying/result-diversity.md @@ -101,7 +101,7 @@ rank-profile diverse_example { match-phase { attribute: popularity - max-hits: 100 + total-max-hits: 1000 max-filter-coverage: 1.0 } diff --git a/en/reference/api/query.html b/en/reference/api/query.html index 6de4ee22e9..70c3e5b0f5 100644 --- a/en/reference/api/query.html +++ b/en/reference/api/query.html @@ -70,8 +70,6 @@

      Parameters

    • ranking.elementGap.fieldName
    • ranking.features [input, rankfeature]
    • ranking.freshness
    • -
    • ranking.globalPhase.rankScoreDropLimit
    • -
    • ranking.globalPhase.rerankCount
    • ranking.keepRankCount
    • ranking.listFeatures [rankfeatures]
    • ranking.matchPhase
    • @@ -80,9 +78,17 @@

      Parameters

    • ranking.properties [rankproperty]
    • ranking.queryCache
    • ranking.rankScoreDropLimit
    • +
    • ranking.matchphase.attribute
    • +
    • ranking.matchPhase.totalMaxHits
    • +
    • ranking.matchPhase.maxHits
    • +
    • ranking.matchPhase.ascending
    • +
    • matchPhase.diversity.attribute
    • +
    • matchPhase.diversity.minGroups
    • ranking.secondPhase.totalRerankCount
    • ranking.secondPhase.rerankCount
    • ranking.secondPhase.rankScoreDropLimit
    • +
    • ranking.globalPhase.rankScoreDropLimit
    • +
    • ranking.globalPhase.rerankCount
    • ranking.significance.useModel
    • ranking.softtimeout.enable
    • ranking.sorting [sorting]
    • @@ -1256,6 +1262,19 @@

      ranking.matchPhase

      The attribute used to limit matches by if more than maxHits hits will be produced.

      + + ranking.matchPhase
      .totalMaxHits + + long + + +

      + The max number of hits that should be generated in total over the content nodes during the match phase. + Setting the value to `0` disables match phase early termination. + Rank profile equivalent: match-phase: total-max-hits +

      + + ranking.matchPhase
      .maxHits @@ -1263,10 +1282,9 @@

      ranking.matchPhase

      - Rank profile equivalent: match-phase: max-hits -

      -

      The max number of hits that should be generated on each content node during the match phase.

      -

      Setting the value to `0` disables the match phase early termination.

      + The max number of hits that should be generated on eache content nodes during the match phase. + Prefer using totalMaxHits over this. + Rank profile equivalent: match-phase: max-hits diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html index 68b301597a..8d5e044cc5 100644 --- a/en/reference/schemas/schemas.html +++ b/en/reference/schemas/schemas.html @@ -116,6 +116,7 @@

      Elements

      match-phase attribute order + total-max-hits max-hits first-phase total-keep-rank-count @@ -1798,7 +1799,7 @@

      match-phase

      match-phase { attribute: [numeric single value attribute] order: [ascending | descending] - max-hits: [integer] + total-max-hits: [integer] }
      @@ -1809,7 +1810,8 @@

      match-phase

      + +

      The quality attribute that decides which documents are a match if the match phase - estimates that there will be more than max-hits hits. + estimates that there will be more than the node's share if + total-max-hits hits. The attribute must be single-value numeric with fast-search enabled. It should correlate with the order which would be produced by a full query evaluation. No default. @@ -1824,11 +1826,19 @@

      match-phase

      as the default value descending is by far the most common.

      total-max-hits +

      + The total max hits that should be produced in the match phase across all nodes + in the group evaluating the query. + This number should be large, and larger the worse the correlation between the + match-phase attribute and the first-phase function.

      +
      max-hits

      The max hits each content node should attempt to produce in the match phase. - Usually, a number like 10000 works well here.

      + Prefer using total-max-hits over this.
      @@ -1862,8 +1872,9 @@

      diversity

      Specify the name of an attribute that will be used to provide diversity. Result sets are guaranteed to get at least min-groups unique values from the diversity attribute from this phase, -but no more than max-hits. For match-phase max-hits = match-phase max-hits. -For second-phase max-hits = total-rerank-count +but no more than max-hits. +For match-phase max-hits = the node's share of match-phase total-max-hits. +For second-phase max-hits = the node's share of total-rerank-count A document is considered a candidate if:
      • The query has not yet reached the max-hits @@ -1894,7 +1905,7 @@

        diversity

        Specifies the minimum number of groups returned from the phase. Using this with match-phase - often means one can reduce max-hits. + often means one can reduce total-max-hits. In second-phase you might reduce total-rerank-count and still get good and diverse results.

        From 3b3f076a7e736c181fa7413e8d2de29ac8511403 Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Mon, 16 Mar 2026 14:07:17 +0100 Subject: [PATCH 4/9] Document totalTargetHits --- en/clients/vespa-cli.html | 2 +- .../practical-search-performance-guide.md | 2 +- en/querying/approximate-nn-hnsw.md | 21 ++--- en/querying/nearest-neighbor-search-guide.md | 74 +++++++++--------- en/querying/nearest-neighbor-search.md | 2 +- en/rag/binarizing-vectors.md | 12 +-- en/rag/embedding.html | 4 +- en/rag/working-with-chunks.html | 2 +- en/ranking/ranking-intro.md | 2 +- en/ranking/wand.html | 8 +- en/reference/api/query.html | 2 +- en/reference/querying/json-query-language.md | 8 +- en/reference/querying/yql.html | 78 +++++++++++++------ en/reference/schemas/schemas.html | 8 +- 14 files changed, 127 insertions(+), 98 deletions(-) diff --git a/en/clients/vespa-cli.html b/en/clients/vespa-cli.html index 196ec173fc..94b1289c69 100644 --- a/en/clients/vespa-cli.html +++ b/en/clients/vespa-cli.html @@ -233,7 +233,7 @@

        Queries

        Example query file:

        {% highlight json %}
         {
        -    "yql": "select product_id, title from products where {targetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
        +    "yql": "select product_id, title from products where {totalTargetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
             "input.query(q_vector)": [-0.050548091530799866, ... ,0.028366032987833023],
             "ranking": "vector_distance"
         }
        diff --git a/en/performance/practical-search-performance-guide.md b/en/performance/practical-search-performance-guide.md
        index 3488a67a36..c95ca052ff 100644
        --- a/en/performance/practical-search-performance-guide.md
        +++ b/en/performance/practical-search-performance-guide.md
        @@ -1122,7 +1122,7 @@ Repeating the query from above, replacing `dotProduct` with `wand`:
           
         
         $ vespa query \
        -    'yql=select track_id, title, artist, tags from track where {targetHits:10}wand(tags, @userProfile)' \
        +    'yql=select track_id, title, artist, tags from track where {totalTargetHits:10}wand(tags, @userProfile)' \
             'userProfile={"hard rock":1, "rock":1,"metal":1, "finnish metal":1}' \
             'hits=1' \
             'ranking=personalized'
        diff --git a/en/querying/approximate-nn-hnsw.md b/en/querying/approximate-nn-hnsw.md
        index 192240d8cd..edd162bd20 100644
        --- a/en/querying/approximate-nn-hnsw.md
        +++ b/en/querying/approximate-nn-hnsw.md
        @@ -134,7 +134,7 @@ or exact (brute-force) search by using the [approximate query annotation](../ref
         
         
         {
        -  "yql": "select * from doc where {targetHits: 100, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
        +  "yql": "select * from doc where {totalTargetHits: 10, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
           "hits": 10
           "input.query(query_image_embedding)": [0.21,0.12,....],
           "ranking.profile": "image_similarity" 
        @@ -150,9 +150,9 @@ Note that exact searches over a large vector volume require adjustment of the
         The default [query timeout](../reference/api/query.html#timeout) is 500ms,
         which will be too low for an exact search over many vectors.
         
        -In addition to [targetHits](../reference/querying/yql.html#targethits), 
        +In addition to [totalTargetHits](../reference/querying/yql.html#totaltargethits), 
         there is a [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) parameter
        -which controls how many extra nodes in the graph (in addition to `targetHits`)
        +which controls how many extra nodes in the graph (in addition to `totalTargetHits`)
         that are explored during the graph search. This parameter is used to tune accuracy quality versus query performance. 
         
         ## Combining approximate nearest neighbor search with filters 
        @@ -174,22 +174,23 @@ Note that when using `pre-filtering` the following query operators are not inclu
         * [predicate](../reference/querying/yql.html#predicate)
         
         These are instead evaluated after the approximate nearest neighbors are retrieved, more like a `post-filter`.
        -This might cause the search to expose fewer hits to ranking than the wanted `targetHits`.
        +This might cause the search to expose fewer hits to ranking than the wanted `totalTargetHits`.
         
         Since {% include version.html version="8.78" %} the `pre-filter` can be evaluated using
         [multiple threads per query](../performance/practical-search-performance-guide.html#multithreaded-search-and-ranking).
         This can be used to reduce query latency for larger vector datasets where the cost of evaluating the `pre-filter` is significant.
         Note that searching the `HNSW` index is always single-threaded per query.
         Multithreaded evaluation when using `post-filtering` has always been supported,
        -but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `targetHits`.
        +but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `totalTargetHits`.
         
         ## Nearest Neighbor Search Considerations
         
        -* **targetHits**:
        -The [targetHits](../reference/querying/yql.html#targethits)
        -specifies how many hits one wants to expose to [ranking](../basics/ranking.html) *per content node*.
        -Approximate search exposes exactly `targetHits` hits to `first-phase` ranking on every content node
        -as long as `targetHits` hits are actually found and not filtered out afterwards.
        +* **totalTargetHits**:
        +The [totalTargetHits](../reference/querying/yql.html#totaltargethits) parameter
        +specifies how many hits one wants to expose to [ranking](../basics/ranking.html) in total over the content nodes
        +participating in the query (you can also set this per node using [targetHits](../reference/querying/yql.html#targethits)).
        +Approximate search exposes exactly `totalTargetHits` hits to `first-phase` ranking over the content nodes
        +as long as `totalTargetHits` hits are actually found and not filtered out.
         Nearest neighbor search is typically used as an efficient retriever in a [phased ranking](../ranking/phased-ranking.html)
         pipeline. See [performance sizing](../performance/sizing-search.html). 
         
        diff --git a/en/querying/nearest-neighbor-search-guide.md b/en/querying/nearest-neighbor-search-guide.md
        index 3c315ed071..b1d69b5bb0 100644
        --- a/en/querying/nearest-neighbor-search-guide.md
        +++ b/en/querying/nearest-neighbor-search-guide.md
        @@ -745,7 +745,7 @@ performing a maximum inner product search over the `tags` weightedset field.
           
         
         $ vespa query \
        -    'yql=select track_id, title, artist from track where {targetHits:10}wand(tags, @userProfile)' \
        +    'yql=select track_id, title, artist from track where {totalTargetHits:10}wand(tags, @userProfile)' \
             'userProfile={"pop":1, "love songs":1,"romantic":10, "80s":20 }' \
             'hits=2' \
             'ranking=tags'
        @@ -822,7 +822,7 @@ and Vespa embed functionality:
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
        +    'yql=select title, artist from track where {approximate:false,totalTargetHits:10}nearestNeighbor(embedding,q)' \
             'hits=1' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -831,13 +831,13 @@ $ vespa query \
         
         Query breakdown:
         
        -- Search for ten (`targetHits:10`) nearest neighbors of the `query(q)` query tensor over the `embedding`
        +- Search for a ten (`totalTargetHits:10`) nearest neighbors of the `query(q)` query tensor over the `embedding`
         document tensor field. 
         - The annotation `approximate:false` tells Vespa to perform exact search.
         - The `hits` parameter controls how many results are returned in the response. Number of `hits`
        -requested does not impact `targetHits`. Notice that `targetHits` is per content node involved in the query. 
        +requested does not impact `totalTargetHits`. 
         - `ranking=closeness` tells Vespa which [rank-profile](../basics/ranking.html) to score documents. One must 
        -specify how to *rank* the `targetHits` documents retrieved and exposed to `first-phase` ranking expression
        +specify how to *rank* the `totalTargetHits` documents retrieved and exposed to `first-phase` ranking expression
         in the `rank-profile`.
         - `input.query(q)` is the query vector produced by the [embedder](../rag/embedding.html#embedding-a-query-text).
         
        @@ -898,7 +898,7 @@ Changing the rank-profile to `closeness-t4` makes Vespa use four threads per que
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
        +    'yql=select title, artist from track where {approximate:false,totalTargetHits:10}nearestNeighbor(embedding,q)' \
             'hits=1' \
             'ranking=closeness-t4' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -932,7 +932,7 @@ field has `index`:
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:10,hnsw.exploreAdditionalHits:20}nearestNeighbor(embedding,q)' \
        +    'yql=select title, artist from track where {totalTargetHits:10,hnsw.exploreAdditionalHits:20}nearestNeighbor(embedding,q)' \
             'hits=1' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -1010,7 +1010,7 @@ In this query example the `title` field must contain the term `heart`:
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains "heart"' \
        +    'yql=select title, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and title contains "heart"' \
             'hits=2' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -1109,7 +1109,7 @@ the matching against the `title` field can use the most efficient posting list r
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains ({ranked:false}"heart")' \
        +    'yql=select title, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and title contains ({ranked:false}"heart")' \
             'hits=2' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -1127,7 +1127,7 @@ with any other Vespa query operator.
           
         
         $ vespa query \
        -    'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 20 and artist contains "Bonnie Tyler"' \
        +    'yql=select title, popularity, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and popularity > 20 and artist contains "Bonnie Tyler"' \
             'hits=2' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -1140,8 +1140,8 @@ This query example restricts the search to tracks by `Bonnie Tyler` with `popula
         When combining nearest neighbor search with strict filters that match less than 2 percent of the total number of documents,
         Vespa will instead of searching the HNSW graph, constrained by the filter, fall back to using exact nearest neighbor search.
         See [Controlling filter behavior](#controlling-filter-behavior) for how to adjust the threshold for which strategy that is used.
        -Since exact search may expose more than `targetHits` hits to the `first-phase` ranking expression,
        -users will observe that `totalCount` increases and is higher than `targetHits` when falling back to exact search.
        +Since exact search may expose more than `totalTargetHits` hits to the `first-phase` ranking expression,
        +users will observe that `totalCount` increases and is higher than `totalTargetHits` when falling back to exact search.
         This can be seen in the previous examples.
         When using exact search with filters, the search can also use multiple threads to evaluate the query, which
         helps reduce the latency impact.
        @@ -1170,7 +1170,7 @@ The following query with a restrictive filter on popularity is used for illustra
           
         
         $ vespa query \
        -    'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
        +    'yql=select title, popularity, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
             'hits=2' \
             'ranking=closeness-t4' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -1239,7 +1239,7 @@ because it's `distance(field, embedding)` is close to 0.5.
           
         
         $ vespa query \
        -    'yql=select title, popularity, artist from track where {distanceThreshold:0.2,targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
        +    'yql=select title, popularity, artist from track where {distanceThreshold:0.2,totalTargetHits:00}nearestNeighbor(embedding,q) and popularity > 80' \
             'hits=2' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
        @@ -1312,7 +1312,7 @@ both based on semantic (vector distance) and traditional sparse (exact) matching
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
        +    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
             'query=total eclipse of the heart' \
             'type=weakAnd' \
             'hits=2' \
        @@ -1428,7 +1428,7 @@ In the below query, we lower the weight of the popularity factor by adjusting `q
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
        +    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
             'query=total eclipse of the heart' \
             'type=weakAnd' \
             'hits=2' \
        @@ -1513,7 +1513,7 @@ Which can be used with the `wand` query operator to retrieve personalized hits f
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery() or ({targetHits:10}wand(tags, @userProfile))' \
        +    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) or userQuery() or ({totalTargetHits:10}wand(tags, @userProfile))' \
             'query=total eclipse of the heart' \
             'type=weakAnd' \
             'hits=2' \
        @@ -1597,7 +1597,7 @@ the query terms in the `weakAnd`.
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery()' \
        +    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) and userQuery()' \
             'query=total eclipse of the heart' \
             'type=weakAnd' \
             'hits=2' \
        @@ -1613,7 +1613,7 @@ It is also possible to combine hybrid search with filters, this filters both the
           
         
         $ vespa query \
        -    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery() and popularity < 75' \
        +    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) and userQuery() and popularity < 75' \
             'query=total eclipse of the heart' \
             'type=weakAnd' \
             'hits=2' \
        @@ -1631,7 +1631,7 @@ rank features for those hits retrieved by the first operand.
           
         
         $ vespa query \
        -    'yql=select title, artist from track where rank({targetHits:100}nearestNeighbor(embedding,q), userQuery())' \
        +    'yql=select title, artist from track where rank({totalTargetHits:100}nearestNeighbor(embedding,q), userQuery())' \
             'query=total eclipse of the heart' \
             'type=weakAnd' \
             'hits=2' \
        @@ -1714,7 +1714,7 @@ retrieved by the sparse query representation.
           
         
         $ vespa query \
        -    'yql=select title, artist from track where rank(userQuery(),{targetHits:100}nearestNeighbor(embedding,q))' \
        +    'yql=select title, artist from track where rank(userQuery(),{totalTargetHits:100}nearestNeighbor(embedding,q))' \
             'query=total eclipse of the heart' \
             'type=weakAnd' \
             'hits=2' \
        @@ -1735,7 +1735,7 @@ One can also use the `rank` operator to first retrieve by some filter logic, and
           
         
         $ vespa query \
        -    'yql=select title, popularity, artist from track where rank(popularity>99,{targetHits:10}nearestNeighbor(embedding,q))' \
        +    'yql=select title, popularity, artist from track where rank(popularity>99,{totalTargetHits:10}nearestNeighbor(embedding,q))' \
             'hits=2' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")' 
        @@ -1758,7 +1758,7 @@ query tensor inputs:
           
         
         $ vespa query \
        -    'yql=select title from track where ({targetHits:10}nearestNeighbor(embedding,q)) or ({targetHits:10}nearestNeighbor(embedding,q1))' \
        +    'yql=select title from track where ({totalTargetHits:10}nearestNeighbor(embedding,q)) or ({totalTargetHits:10}nearestNeighbor(embedding,q1))' \
             'hits=2' \
             'ranking=closeness' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'  \
        @@ -1835,7 +1835,7 @@ rank-profile closeness-label inherits closeness {
           
         
         $ vespa query \
        -    'yql=select title from track where ({ label:"q", targetHits:10}nearestNeighbor(embedding,q)) or ({label:"q1",targetHits:10}nearestNeighbor(embedding,q1))' \
        +    'yql=select title from track where ({ label:"q", totalTargetHits:10}nearestNeighbor(embedding,q)) or ({label:"q1",totalTargetHits:10}nearestNeighbor(embedding,q1))' \
             'hits=2' \
             'ranking=closeness-label' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'  \
        @@ -1897,14 +1897,14 @@ The above query annotates the two `nearestNeighbor` query operators using
         }{% endhighlight %}
        Note that the previous examples used `or` to combine the two operators. Using `and` instead, requires -that there are documents that is in both the top-k results. Increasing `targetHits` to 500, +that there are documents that is in both the top-k results. Increasing `totalTargetHits` to 500, finds a few tracks that overlap.
         $ vespa query \
        -    'yql=select title from track where ({label:"q", targetHits:500}nearestNeighbor(embedding,q)) and ({label:"q1",targetHits:500}nearestNeighbor(embedding,q1))' \
        +    'yql=select title from track where ({label:"q", totalTargetHits:500}nearestNeighbor(embedding,q)) and ({label:"q1",totalTargetHits:500}nearestNeighbor(embedding,q1))' \
             'hits=2' \
             'ranking=closeness-label' \
             'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'  \
        @@ -2019,7 +2019,7 @@ do not perform post-filtering, use *pre-filtering* strategy:
           
         
         $ vespa query \
        -  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
        +  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
           'hits=2' \
           'ranking=closeness' \
           'ranking.matching.postFilterThreshold=1.0' \
        @@ -2028,14 +2028,14 @@ $ vespa query \
         
        -The query exposes `targetHits` to ranking as seen from the `totalCount`. Now, repeating the query, but +The query exposes `totalTargetHits` to ranking as seen from the `totalCount`. Now, repeating the query, but forcing *post-filtering* instead by setting *ranking.matching.postFilterThreshold=0.0*:
         $ vespa query \
        -  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
        +  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
           'hits=2' \
           'ranking=closeness' \
           'ranking.matching.postFilterThreshold=0.0' \
        @@ -2045,21 +2045,21 @@ $ vespa query \
         
        In this case, Vespa will estimate how many documents the filter matches and auto-adjust `targethits` internally to a -higher number, attempting to expose the `targetHits` to first phase ranking: +higher number, attempting to expose the `totalTargetHits` to first phase ranking: The query exposes 16 documents to ranking as can be seen from `totalCount`. There are `8420` documents in the collection that are tagged with the `rock` tag, so roughly 8%. -Auto adjusting `targetHits` upwards for post-filtering is not always what you want, because it is slower than just retrieving +Auto adjusting `totalTargetHits` upwards for post-filtering is not always what you want, because it is slower than just retrieving from the HNSW index without constraints. We can change the -`targetHits` adjustment factor with the [ranking.matching.targetHitsMaxAdjustmentFactor](../reference/api/query.html#ranking.matching) parameter. -In this case, we set it to 1, which disables adjusting the `targetHits` upwards. +`totalTargetHits` adjustment factor with the [ranking.matching.targetHitsMaxAdjustmentFactor](../reference/api/query.html#ranking.matching) parameter. +In this case, we set it to 1, which disables adjusting the `totalTargetHits` upwards.
         $ vespa query \
        -  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
        +  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
           'hits=2' \
           'ranking=closeness' \
           'ranking.matching.postFilterThreshold=0.0' \
        @@ -2068,7 +2068,7 @@ $ vespa query \
           'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
         
        -Since we are post-filtering without upward adjusting the targetHits, we end up with fewer hits. +Since we are post-filtering without upward adjusting totalTargetHits, we end up with fewer hits. Changing the query to limit to a tag which is less frequent, for example, `90s`, which matches 1,695 documents or roughly 1.7%, will cause Vespa to fall back to exact search as the estimated filter hit count @@ -2078,7 +2078,7 @@ is less than the `approximateThreshold`.
         $ vespa query \
        -  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "90s"' \
        +  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "90s"' \
           'hits=2' \
           'ranking=closeness' \
           'ranking.matching.postFilterThreshold=0.0' \
        @@ -2087,7 +2087,7 @@ $ vespa query \
         
        -The fallback to exact search will expose more than `targetHits` documents to ranking. +The fallback to exact search will expose more than `totalTargetHits` documents to ranking. Read more about combining filters with nearest neighbor search in the [Query Time Constrained Approximate Nearest Neighbor Search](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/) blog post. diff --git a/en/querying/nearest-neighbor-search.md b/en/querying/nearest-neighbor-search.md index 570ff6ce39..941a7fea30 100644 --- a/en/querying/nearest-neighbor-search.md +++ b/en/querying/nearest-neighbor-search.md @@ -376,7 +376,7 @@ using the [Query API](query-api.html#http): ```json { - "yql": "select * from product where {targetHits: 100}nearestNeighbor(image_embeddings, image_query_embedding) and in_stock = true", + "yql": "select * from product where {totalTargetHits: 100}nearestNeighbor(image_embeddings, image_query_embedding) and in_stock = true", "input.query(image_query_embedding)": [ 0.22507139604882176, 0.11696498718517367, diff --git a/en/rag/binarizing-vectors.md b/en/rag/binarizing-vectors.md index 9dce96e286..a215e8d2d1 100644 --- a/en/rag/binarizing-vectors.md +++ b/en/rag/binarizing-vectors.md @@ -301,7 +301,7 @@ Assuming a query using the doc_embedding field: ``` $ vespa query \ - 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \ + 'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding, q)' \ 'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'ranking=app_ranking' ``` @@ -310,7 +310,7 @@ The same query, with a binarized query vector, to the binarized field: ``` $ vespa query \ - 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ + 'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin' ``` @@ -370,7 +370,7 @@ rank-profile app_ranking { Query: ``` $ vespa query \ - 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \ + 'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding, q)' \ 'input.query(q)=[2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]' \ 'ranking=app_ranking' ``` @@ -397,7 +397,7 @@ Query: ``` $ vespa query \ - 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ + 'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin' ``` @@ -440,7 +440,7 @@ Notes: Note the differences when using full values in the query tensor, see the relevance score for the results: ``` $ vespa query \ - 'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ + 'yql=select * from music where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin_full' @@ -452,7 +452,7 @@ $ vespa query \ ``` $ vespa query \ - 'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ + 'yql=select * from music where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q)=[2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin_full' diff --git a/en/rag/embedding.html b/en/rag/embedding.html index 884b9839f6..06cbaca9b0 100644 --- a/en/rag/embedding.html +++ b/en/rag/embedding.html @@ -76,7 +76,7 @@

        Embedding a query text

        The text argument can be supplied by a referenced parameter instead, using the @parameter syntax:

        {% highlight json %}
         {
        -    "yql": "select * from doc where {targetHits:10}nearestNeighbor(embedding_field, query_embedding)",
        +    "yql": "select * from doc where {totalTargetHits:10}nearestNeighbor(embedding_field, query_embedding)",
             "text": "my text to embed",
             "input.query(query_embedding)": "embed(@text)",
         }
        @@ -761,7 +761,7 @@ 

        Adding a fixed string to a query the text value which then is embedded.

        {% highlight json %}
         {
        -    "yql": "select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))",
        +    "yql": "select * from doc where userQuery() or (totalTtargetHits: 100}nearestNeighbor(embedding, e))",
             "input.query(e)": "embed(mxbai, @text)",
             "user_query": "space contains many suns"
         }
        diff --git a/en/rag/working-with-chunks.html b/en/rag/working-with-chunks.html
        index 07ad37c463..5956e2f832 100644
        --- a/en/rag/working-with-chunks.html
        +++ b/en/rag/working-with-chunks.html
        @@ -166,7 +166,7 @@ 

        Searching chunks

        A simple hybrid query can look like this:

        -yql=select * from doc where userInput(@query) or ({targetHits:10}nearestNeighbor(myEmbeddings, e))
        +yql=select * from doc where userInput(@query) or ({totalTargetHits:10}nearestNeighbor(myEmbeddings, e))
         input.query(e)=embed(@query)
         query=Do Cholesterol Statin Drugs Cause Breast Cancer?
         
        diff --git a/en/ranking/ranking-intro.md b/en/ranking/ranking-intro.md index 2d1da716eb..da9b394dee 100644 --- a/en/ranking/ranking-intro.md +++ b/en/ranking/ranking-intro.md @@ -365,7 +365,7 @@ As the point of [weakAnd](../reference/querying/yql.html#weakand) is to early di _totalCount_ is an approximation: yql=select * from doc where -{scoreThreshold: 0, targetHits: 10}weakAnd( +{scoreThreshold: 0, totalTargetHits: 10}weakAnd( default contains "vespa", default contains "documents", default contains "about", diff --git a/en/ranking/wand.html b/en/ranking/wand.html index ca54ad27cf..b0f04289c3 100644 --- a/en/ranking/wand.html +++ b/en/ranking/wand.html @@ -41,7 +41,7 @@ The WAND algorithm tries to address this problem by starting the search for candidate documents using OR, limiting the number of documents that are ranked, saving both latency and resource usage (cost) while still returning the same or almost the same top-k results as the brute force OR. - For the example, using WAND with K or targetHits to 1000, only 196,900 documents are fully ranked. + For the example, using WAND with K or totalTargetHits to 1000, only 196,900 documents are fully ranked. That is a huge improvement over the exhaustive OR search which retrieves and ranks 7,926,256 documents and at the same time retrieving the same results as the exhaustive OR search.

        @@ -109,7 +109,7 @@

        weakAnd

        specify the target for minimum number of hits the operator should produce per content node involved in the query.

        - The effect of tuning targetHits may not be intuitive. + The effect of tuning totalTargetHits may not be intuitive. To ensure that you get the best hits possible with a weakAnd, set the target number somewhat higher than the number of hits returned to the user; setting it 10 times higher should be more than enough. @@ -156,7 +156,7 @@

        weakAnd

         select * from passages where (
        -    {targetHits: 200}
        +    {totalTargetHits: 200}
                 weakAnd(
                     default contains "is", default contains "cdg", default contains "airport",
                     default contains "in", default contains "main", default contains "paris"
        @@ -277,7 +277,7 @@ 

        wand

         {
             "yql":"select * from passages where rank(
        -        ({targetHits: 25}
        +        ({totalTargetHits: 25}
                     wand(deep_ct_tokens, @tokens)),
                     userQuery())",
             "tokens": "{2003: 1, 3729: 1, 2290: 1, 3199: 1, 1999: 1, 2364: 1, 3000: 1}",
        diff --git a/en/reference/api/query.html b/en/reference/api/query.html
        index 70c3e5b0f5..f9ecda7bb9 100644
        --- a/en/reference/api/query.html
        +++ b/en/reference/api/query.html
        @@ -1149,7 +1149,7 @@ 

        ranking.matching

        Value used to control the auto-adjustment of - targetHits used when evaluating an approximate + totalTargetHits used when evaluating an approximate nearestNeighbor operator with post-filtering.

        diff --git a/en/reference/querying/json-query-language.md b/en/reference/querying/json-query-language.md index b19fc748f2..65fdfc0f81 100644 --- a/en/reference/querying/json-query-language.md +++ b/en/reference/querying/json-query-language.md @@ -494,7 +494,7 @@ Format of this in JSON: Another example: -YQL: `where [ {"scoreThreshold": 13, "targetHits": 7} ]wand(description, {"a":1, "b":2})`. +YQL: `where [ {"scoreThreshold": 13, "totalTargetHits": 7} ]wand(description, {"a":1, "b":2})`. Format of this in JSON: @@ -502,7 +502,7 @@ Format of this in JSON: "where" : { "wand" : { "children" : [ "description", {"a" : 1, "b":2} ], - "attributes" : {"scoreThreshold": 13, "targetHits": 7} + "attributes" : {"scoreThreshold": 13, "totalTargetHits": 7} } } ``` @@ -530,7 +530,7 @@ Format of this in JSON: ``` ###### weakAnd -YQL: `where {scoreThreshold: 41, "targetHits": 7}weakAnd(a contains "A", b contains "B")`. +YQL: `where {scoreThreshold: 41, "totalTargetHits": 7}weakAnd(a contains "A", b contains "B")`. Format of this in JSON: @@ -538,7 +538,7 @@ Format of this in JSON: "where" : { "weakAnd" : { "children" : [ { "contains" : ["a", "A"] }, { "contains" : ["b", "B"] } ], - "attributes" : {"scoreThreshold": 41, "targetHits": 7} + "attributes" : {"scoreThreshold": 41, "totalTargetHits": 7} } } ``` diff --git a/en/reference/querying/yql.html b/en/reference/querying/yql.html index ef96cbbff3..0bcf105fa1 100644 --- a/en/reference/querying/yql.html +++ b/en/reference/querying/yql.html @@ -599,6 +599,7 @@

        where

        only the following annotations are applied:

        • defaultIndex
        • +
        • totalTargetHits
        • (for weakAnd)
        • targetHits
        • (for weakAnd)
        • distance
        • (for near/oNear)
        • ranked
        • @@ -963,14 +964,20 @@

          where

          scoreThreshold Minimum rank score for hits to include. + + totalTargetHits + Wanted number of hits exposed to the first-phase ranking function in total over the content nodes + evaluating the query. + targetHits - Wanted number of hits exposed to the real first-phase ranking function per content node. + Wanted number of hits exposed to the first-phase ranking function per content node. + Prefer using totalTargetHits over this.
          -where ({scoreThreshold: 0.13, targetHits: 7}wand(description, {"a":1, "b":2}))
          +where ({scoreThreshold: 0.13, totalTargetHits: 7}wand(description, {"a":1, "b":2}))
           

          Refer to using wand for introduction to the WAND @@ -1039,14 +1046,19 @@

          where

          + + totalTargetHits + Wanted number of hits exposed to the first-phase ranking function in total over the content nodes evaluating the query. + targetHits - Wanted number of hits exposed to the real first-phase ranking function per content node. + Wanted number of hits exposed to the first-phase ranking function per content node. + Prefer using totalTargetHits over this.
          -where ({targetHits: 7}weakAnd(a contains "A", b contains "B"))
          +where ({totaltargetHits: 7}weakAnd(a contains "A", b contains "B"))
           

          Unlike wand, weakAnd can be used @@ -1280,12 +1292,12 @@

          where

          the approximate nearest neighbors are returned. Example:

          -where ({targetHits: 10}nearestNeighbor(doc_vector, query_vector))&input.query(query_vector)=[3,5,7]&ranking=semantic
          +where ({totaltargetHits: 10}nearestNeighbor(doc_vector, query_vector))&input.query(query_vector)=[3,5,7]&ranking=semantic
           

          In this example we search for the top 10 nearest neighbors in a 3-dimensional vector space. - targetHits specifies the top-k nearest neighbors to expose to a user defined semantic - rank profile. The targetHits annotation is required. + totalTargetHits specifies the top-k nearest neighbors to expose to a user defined semantic + rank profile. The totalTargetHits annotation is required. The first parameter of nearestNeighbor is the name of the tensor field attribute containing the document vectors (doc_vector).

          @@ -1331,13 +1343,20 @@

          where

          + + totalTargetHits + + Specifies the number of hits nearestNeighbor + should expose to ranking in total over the content + nodes evaluating the query. Note that more or less hits may actually be produced. + Setting target hits is required. + + targetHits - This annotation is required, and specifies the number of hits nearestNeighbor - should expose to ranking. - Note that more or less hits might actually be produced. targetHits is per node - involved in the query. + Specifies the target hits per node. + Prefer using totalTargetHits over this. @@ -1356,7 +1375,7 @@

          where

          hnsw.exploreAdditionalHits - Tune how many extra nodes in the HNSW graph (in addition to targetHits) + Tune how many extra nodes in the HNSW graph (in addition to totalTargetHits) that should be explored before selecting the best hits. Default is 0. Increasing this parameter increases the accuracy of the approximate search, at the cost of more distance computations. @@ -1818,7 +1837,7 @@

          Annotations

          The distanceThreshold annotation may be used to filter away hits with a higher distance than the given threshold from the results. Note that one will never get more hits with distanceThreshold than you would get without it - - to get more hits, increase targetHits, too. + to get more hits, increase totalTargetHits, too. The units for the threshold depends on the distance metric used.

          @@ -1991,7 +2010,7 @@

          Annotations

          Used in nearestNeighbor. When using an HNSW index, the optional hnsw.exploreAdditionalHits annotation can be used to - tune how many extra nodes in the graph (in addition to targetHits) + tune how many extra nodes in the graph (in addition to totalTargetHits) should be explored before selecting the best hits. Using a greater number here gives better quality, but worse performance.

          @@ -2182,23 +2201,32 @@

          Annotations

          Do suffix matching for this term, e.g. search for "*word".

          - targetHits + totalTargetHits 100 int -

          - Used by wand and weakAnd, where the default is 100. +

          + Used by wand and weakAnd, where the default is 100, + and with nearestNeighbor, + where it has no default. + This sets the wanted number of hits exposed to the first-phase ranking function in total + over the content nodes evaluating the query (a group). + If additional second phase ranking is used, + do not set totalTargetHits less than the configured rank-profile's + total-rerank-count.

          - It is also used with nearestNeighbor, - where it has no default - it must always be set, - see examples in nearest neighbor search. + See examples in nearest neighbor search.

          -

          - It sets the wanted number of hits exposed to the real first-phase ranking function per content node. - If additional second phase ranking is used, - do not set targetHits less than the configured rank-profile's - total-rerank-count. + + + + targetHits + 100 + int + +

          + Sets target hots per node. Prefer using totalTargetHits over this.

          diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html index 8d5e044cc5..ff7abc29ce 100644 --- a/en/reference/schemas/schemas.html +++ b/en/reference/schemas/schemas.html @@ -1609,9 +1609,9 @@

          rank-profile

          Controlling the filtering behavior with approximate nearest neighbor search for more details.

          - With post-filtering the targetHits value - used when searching the HNSW index is auto-adjusted in an effort to expose targetHits hits - to first-phase ranking after post-filtering has been applied. The following formula is used: + With post-filtering the totalTargetHits value + used when searching the HNSW index is auto-adjusted in an effort to expose the node's shgare of totalTargetHits + hits to first-phase ranking after post-filtering has been applied. The following formula is used:

               adjustedTargetHits = min(targetHits / estimatedFilterHitRatio, targetHits * targetHitsMaxAdjustmentFactor).
          @@ -1705,7 +1705,7 @@ 

          rank-profile

          Value (in the range [1.0, inf]) used to control the auto-adjustment of - targetHits used when evaluating an approximate + totalTargetHits used when evaluating an approximate nearestNeighbor operator with post-filtering. The default value is 20.0. From 447019c9a01925899876624ac8d856a077cd13c9 Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Mon, 16 Mar 2026 18:00:34 +0100 Subject: [PATCH 5/9] Update en/learn/faq.md Co-authored-by: Kristian Aune --- en/learn/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/en/learn/faq.md b/en/learn/faq.md index 9ede0766ef..762fa5c011 100644 --- a/en/learn/faq.md +++ b/en/learn/faq.md @@ -87,7 +87,7 @@ of a double. This can happen in two cases: - The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce -Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this. -- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) perameter. +- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) parameter. ### How to pin query results? To hard-code documents to positions in the result set, From 82789856ca1c84af77cab4449e3ae237e4a8472a Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Mon, 16 Mar 2026 18:03:16 +0100 Subject: [PATCH 6/9] Update en/reference/querying/yql.html Co-authored-by: Kristian Aune --- en/reference/querying/yql.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/en/reference/querying/yql.html b/en/reference/querying/yql.html index 0bcf105fa1..5b085edb67 100644 --- a/en/reference/querying/yql.html +++ b/en/reference/querying/yql.html @@ -2226,7 +2226,7 @@

          Annotations

          int

          - Sets target hots per node. Prefer using totalTargetHits over this. + Sets target hits per node. Prefer using totalTargetHits over this.

          From 95f294294a57e432d1a422e24d36e1752e42e5d8 Mon Sep 17 00:00:00 2001 From: Jon Bratseth Date: Mon, 16 Mar 2026 18:03:34 +0100 Subject: [PATCH 7/9] Update en/reference/schemas/schemas.html Co-authored-by: Kristian Aune --- en/reference/schemas/schemas.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html index ff7abc29ce..517cef0b7b 100644 --- a/en/reference/schemas/schemas.html +++ b/en/reference/schemas/schemas.html @@ -1610,7 +1610,7 @@

          rank-profile

          With post-filtering the totalTargetHits value - used when searching the HNSW index is auto-adjusted in an effort to expose the node's shgare of totalTargetHits + used when searching the HNSW index is auto-adjusted in an effort to expose the node's share of totalTargetHits hits to first-phase ranking after post-filtering has been applied. The following formula is used:

          
          From 05caca7ba0595339ccd6043bb1bc32e66046c7ad Mon Sep 17 00:00:00 2001
          From: Jon Bratseth 
          Date: Mon, 16 Mar 2026 18:02:53 +0100
          Subject: [PATCH 8/9] Correct sentence
          
          ---
           en/learn/tutorials/rag-blueprint.md | 4 ++--
           1 file changed, 2 insertions(+), 2 deletions(-)
          
          diff --git a/en/learn/tutorials/rag-blueprint.md b/en/learn/tutorials/rag-blueprint.md
          index 7750c5e668..058df3a752 100644
          --- a/en/learn/tutorials/rag-blueprint.md
          +++ b/en/learn/tutorials/rag-blueprint.md
          @@ -570,8 +570,8 @@ not the case for most real-world RAG applications, so this is cruical to have in
           
           ![phased ranking overview](/assets/img/phased-ranking-rag.png)
           
          -That the stateless container nodes can 
          -also be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
          +The stateless container nodes can 
          +be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
           
           ## Configuring match-phase (retrieval)
           
          
          From 59eccc08ff7aa9c7ec51e9a60289223eef23643f Mon Sep 17 00:00:00 2001
          From: Jon Bratseth 
          Date: Mon, 16 Mar 2026 18:04:26 +0100
          Subject: [PATCH 9/9] Update en/reference/schemas/schemas.html
          
          Co-authored-by: Kristian Aune 
          ---
           en/reference/schemas/schemas.html | 2 +-
           1 file changed, 1 insertion(+), 1 deletion(-)
          
          diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html
          index 517cef0b7b..e32c47915d 100644
          --- a/en/reference/schemas/schemas.html
          +++ b/en/reference/schemas/schemas.html
          @@ -1874,7 +1874,7 @@ 

          diversity

          unique values from the diversity attribute from this phase, but no more than max-hits. For match-phase max-hits = the node's share of match-phase total-max-hits. -For second-phase max-hits = the node's share of total-rerank-count +For second-phase max-hits = the node's share of total-rerank-count. A document is considered a candidate if:
          • The query has not yet reached the max-hits