vespa-engine · bratseth · Mar 16, 2026 · Mar 13, 2026 · Mar 13, 2026 · Mar 16, 2026
diff --git a/en/basics/ranking.html b/en/basics/ranking.html
@@ -96,12 +96,12 @@ <h2 id="phased-ranking">Phased ranking</h2>
 
         second-phase {
             expression: xgboost(my_xgboost_reranker)
-            rerank-count: 1000   # per content node
+            total-rerank-count: 1000 # Over all nodes
         }
 
         global-phase {
           expression: sum(onnx(my_large_onnx_model))
-          rerank-count: 20  # globally
+          rerank-count: 20
         }
 
     }

diff --git a/en/clients/vespa-cli.html b/en/clients/vespa-cli.html
@@ -233,7 +233,7 @@ <h3 id="queries">Queries</h3>
 <p>Example query file:</p>
 <pre>{% highlight json %}
 {
-    "yql": "select product_id, title from products where {targetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
+    "yql": "select product_id, title from products where {totalTargetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
     "input.query(q_vector)": [-0.050548091530799866, ... ,0.028366032987833023],
     "ranking": "vector_distance"
 }

diff --git a/en/content/attributes.html b/en/content/attributes.html
@@ -586,7 +586,7 @@ <h2 id="paged-attributes">Paged attributes</h2>
   where the number of attribute accesses are limited by the re-ranking phase count.
 </p>
 <p>
-  For example using a second phase <a href="../reference/schemas/schemas.html#secondphase-rerank-count">rerank-count</a>
+  For example using a second phase <a href="../reference/schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>
   of 100 will limit the maximum number of page-ins/disk access per query to 100.
   Running at 100 QPS would need up to 10K disk accesses per second.
   This is the worst case if none of the accessed attribute data were paged into memory already.
@@ -608,7 +608,7 @@ <h2 id="paged-attributes">Paged attributes</h2>
     rank-profile foo {
         first-phase {}
         second-phase {
-            rerank-count: 100
+            total-rerank-count: 100
             expression: sum(attribute(tensordata))
         }
     }

diff --git a/en/learn/faq.md b/en/learn/faq.md
@@ -87,7 +87,7 @@ of a double. This can happen in two cases:
 
 - The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce
 -Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this.
-- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside of what Vespa computed and caches on a heap. This is controlled by the [keep-rank-count](../reference/schemas/schemas.html#keep-rank-count).
+- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) perameter.
 
 ### How to pin query results?
 To hard-code documents to positions in the result set,

diff --git a/en/learn/tutorials/rag-blueprint.md b/en/learn/tutorials/rag-blueprint.md
@@ -570,8 +570,7 @@ not the case for most real-world RAG applications, so this is cruical to have in
 
 ![phased ranking overview](/assets/img/phased-ranking-rag.png)
 
-It is worth noting that parameters such as `targetHits` (for the match phase) and `rerank-count` 
-(for first and second phase) are applied **per content node**. Also note that the stateless container nodes can 
+That the stateless container nodes can 
 also be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
 
 ## Configuring match-phase (retrieval)
@@ -1380,8 +1379,8 @@ We run the evaluation script on a set of unseen test queries, and get the follow
 ```
 
 For the first phase ranking, we care most about recall, as we just want to make sure that the candidate documents are 
-ranked high enough to be included in the second-phase ranking. (the default number of documents that will be exposed to 
-second-phase is 10 000, but can be controlled by the `rerank-count` parameter).
+ranked high enough to be included in the second-phase ranking. The number of documents to be reranked in second-phase
+in total over all content nodes is controlled by the `total-rerank-count` parameter.
 
 We can see that our results are already very good. This is of course due to the fact that we have a small,synthetic dataset. 
 In reality, you should align the metric expectations with your dataset and test queries.
@@ -1392,7 +1391,7 @@ within your latency budget, as you want some headroom for second-phase ranking.
 ## Second-phase ranking
 
 For the second-phase ranking, we can afford to use a more expensive ranking expression, since we will only run it 
-on the top-k documents from the first-phase ranking (defined by the `rerank-count` parameter, which defaults to 10,000 documents).
+on the top-k documents from the first-phase ranking (decided by the `total-rerank-count` parameter).
 
 This is where we can significantly improve ranking quality by using more sophisticated models and features that would 
 be too expensive to compute for all matched documents.
@@ -1589,7 +1588,7 @@ vespa query \
 **Performance monitoring:**
 
 * Monitor latency impact of second-phase ranking
-* Adjust `rerank-count` based on quality vs. performance trade-offs
+* Adjust `total-rerank-count` based on quality vs. performance trade-offs
 * Consider using different models for different query types or use cases
 
 The second-phase ranking represents a crucial step in building high-quality RAG applications, 
@@ -1598,7 +1597,7 @@ providing the precision needed for effective LLM context while maintaining reaso
 ## (Optional) Global-phase ranking
 
 We also have the option of configuring [global-phase](../../reference/schemas/schemas.html#globalphase-rank) ranking, which can rerank the top k 
-(as set by `rerank-count` parameter) documents from the second-phase ranking.
+(as set by `total-rerank-count` parameter) documents from the second-phase ranking.
 
 Common options for global-phase are [cross-encoders](../../ranking/cross-encoders.html) or another GBDT model, trained for 
 better separating top ranked documents on objectives such as [LambdaMart](https://xgboost.readthedocs.io/en/latest/tutorials/learning_to_rank.html). For RAG applications, 

diff --git a/en/performance/graceful-degradation.html b/en/performance/graceful-degradation.html
@@ -177,21 +177,21 @@ <h2 id="match-phase-degradation">Match phase degradation</h2>
 <p>
 Match-phase works by specifying an <code>attribute</code> that measures document
 quality in some way (popularity, click-through rate, pagerank, ad bid value, price, text quality).
-In addition, a <code>max-hits</code> value is specified
-that specifies how many hits are "more than enough" for the application.
+In addition, a <code>total.max-hits</code> value is specified
+that specifies how many hits in total over the content nodes are "more than enough" for the application.
 Then an estimate is made after collecting a reasonable amount of hits for the query,
-and if the estimate is higher than the configured <code>max-hits</code> value,
+and if the estimate is higher than the node's share of the <code>total-max-hits</code> value,
 an extra limitation is added to the query,
 ensuring that only the highest quality documents can become hits.
 </p><p>
 In effect, this limits the documents actually queried to the highest quality documents,
 a subset of the full corpus,
 where the size of subset is calculated in such a way
-that the query is estimated to give <code>max-hits</code> hits.
+that the query is estimated to give the node's share of <code>total-max-hits</code> hits.
 Since some (low-quality) hits will already have been collected to do the estimation,
-the actual number of hits returned will usually be higher than max-hits.
+the actual number of hits returned will usually be higher than total-max-hits.
 But since the distribution of documents isn't perfectly smooth,
-you risk sometimes getting less than the configured <code>max-hits</code> hits back.
+you risk sometimes getting less than the configured <code>total-max-hits</code> hits back.
 </p><p>
 Note that limiting hits in the match-phase also affects <a href="../querying/grouping.html">aggregation/grouping</a>,
 and total-hit-count since it actually limits, so the query gets fewer hits.
@@ -200,7 +200,7 @@ <h2 id="match-phase-degradation">Match phase degradation</h2>
 since they both operate in the same manner,
 and you would get interference between them that could cause unpredictable results.
 The graph shows possible hits versus actual hits in a corpus with 100 000 documents,
-where <code>max-hits</code> is configured to 10 000.
+where <code>total-max-hits</code> is configured to 10 000 per node.
 The corpus is a synthetic (slightly randomized) data set,
 in practice the graph will be less smooth:
 </p>

diff --git a/en/performance/practical-search-performance-guide.md b/en/performance/practical-search-performance-guide.md
@@ -1122,7 +1122,7 @@ Repeating the query from above, replacing `dotProduct` with `wand`:
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains="Vastarannan valssi">
 $ vespa query \
-    'yql=select track_id, title, artist, tags from track where {targetHits:10}wand(tags, @userProfile)' \
+    'yql=select track_id, title, artist, tags from track where {totalTargetHits:10}wand(tags, @userProfile)' \
     'userProfile={"hard rock":1, "rock":1,"metal":1, "finnish metal":1}' \
     'hits=1' \
     'ranking=personalized'

diff --git a/en/querying/approximate-nn-hnsw.md b/en/querying/approximate-nn-hnsw.md
@@ -134,7 +134,7 @@ or exact (brute-force) search by using the [approximate query annotation](../ref
 
 <pre>
 {
-  "yql": "select * from doc where {targetHits: 100, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
+  "yql": "select * from doc where {totalTargetHits: 10, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
   "hits": 10
   "input.query(query_image_embedding)": [0.21,0.12,....],
   "ranking.profile": "image_similarity" 
@@ -150,9 +150,9 @@ Note that exact searches over a large vector volume require adjustment of the
 The default [query timeout](../reference/api/query.html#timeout) is 500ms,
 which will be too low for an exact search over many vectors.
 
-In addition to [targetHits](../reference/querying/yql.html#targethits), 
+In addition to [totalTargetHits](../reference/querying/yql.html#totaltargethits), 
 there is a [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) parameter
-which controls how many extra nodes in the graph (in addition to `targetHits`)
+which controls how many extra nodes in the graph (in addition to `totalTargetHits`)
 that are explored during the graph search. This parameter is used to tune accuracy quality versus query performance. 
 
 ## Combining approximate nearest neighbor search with filters 
@@ -174,22 +174,23 @@ Note that when using `pre-filtering` the following query operators are not inclu
 * [predicate](../reference/querying/yql.html#predicate)
 
 These are instead evaluated after the approximate nearest neighbors are retrieved, more like a `post-filter`.
-This might cause the search to expose fewer hits to ranking than the wanted `targetHits`.
+This might cause the search to expose fewer hits to ranking than the wanted `totalTargetHits`.
 
 Since {% include version.html version="8.78" %} the `pre-filter` can be evaluated using
 [multiple threads per query](../performance/practical-search-performance-guide.html#multithreaded-search-and-ranking).
 This can be used to reduce query latency for larger vector datasets where the cost of evaluating the `pre-filter` is significant.
 Note that searching the `HNSW` index is always single-threaded per query.
 Multithreaded evaluation when using `post-filtering` has always been supported,
-but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `targetHits`.
+but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `totalTargetHits`.
 
 ## Nearest Neighbor Search Considerations
 
-* **targetHits**:
-The [targetHits](../reference/querying/yql.html#targethits)
-specifies how many hits one wants to expose to [ranking](../basics/ranking.html) *per content node*.
-Approximate search exposes exactly `targetHits` hits to `first-phase` ranking on every content node
-as long as `targetHits` hits are actually found and not filtered out afterwards.
+* **totalTargetHits**:
+The [totalTargetHits](../reference/querying/yql.html#totaltargethits) parameter
+specifies how many hits one wants to expose to [ranking](../basics/ranking.html) in total over the content nodes
+participating in the query (you can also set this per node using [targetHits](../reference/querying/yql.html#targethits)).
+Approximate search exposes exactly `totalTargetHits` hits to `first-phase` ranking over the content nodes
+as long as `totalTargetHits` hits are actually found and not filtered out.
 Nearest neighbor search is typically used as an efficient retriever in a [phased ranking](../ranking/phased-ranking.html)
 pipeline. See [performance sizing](../performance/sizing-search.html).