From bbe00da2d09ec5090fa979360d7a9622e4940bb7 Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@gmail.com>
Date: Fri, 13 Mar 2026 14:08:13 +0100
Subject: [PATCH 1/9] Document total-rerank-count

---
 en/basics/ranking.html                 |  4 ++--
 en/content/attributes.html             |  4 ++--
 en/learn/tutorials/rag-blueprint.md    | 13 +++++------
 en/querying/nearest-neighbor-search.md |  2 +-
 en/ranking/phased-ranking.html         | 12 +++++-----
 en/reference/api/query.html            | 26 ++++++++++++++++-----
 en/reference/querying/yql.html         |  4 ++--
 en/reference/schemas/schemas.html      | 32 ++++++++++++++++++--------
 8 files changed, 61 insertions(+), 36 deletions(-)
diff --git a/en/basics/ranking.html b/en/basics/ranking.html
index 1b44706226..41f5a56fb7 100644
--- a/en/basics/ranking.html
+++ b/en/basics/ranking.html
@@ -96,12 +96,12 @@ <h2 id="phased-ranking">Phased ranking</h2>
 
         second-phase {
             expression: xgboost(my_xgboost_reranker)
-            rerank-count: 1000   # per content node
+            total-rerank-count: 1000 # Over all nodes
         }
 
         global-phase {
           expression: sum(onnx(my_large_onnx_model))
-          rerank-count: 20  # globally
+          rerank-count: 20
         }
 
     }
diff --git a/en/content/attributes.html b/en/content/attributes.html
index baa69121e6..378f911273 100644
--- a/en/content/attributes.html
+++ b/en/content/attributes.html
@@ -586,7 +586,7 @@ <h2 id="paged-attributes">Paged attributes</h2>
   where the number of attribute accesses are limited by the re-ranking phase count.
 </p>
 <p>
-  For example using a second phase <a href="../reference/schemas/schemas.html#secondphase-rerank-count">rerank-count</a>
+  For example using a second phase <a href="../reference/schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>
   of 100 will limit the maximum number of page-ins/disk access per query to 100.
   Running at 100 QPS would need up to 10K disk accesses per second.
   This is the worst case if none of the accessed attribute data were paged into memory already.
@@ -608,7 +608,7 @@ <h2 id="paged-attributes">Paged attributes</h2>
     rank-profile foo {
         first-phase {}
         second-phase {
-            rerank-count: 100
+            total-rerank-count: 100
             expression: sum(attribute(tensordata))
         }
     }
diff --git a/en/learn/tutorials/rag-blueprint.md b/en/learn/tutorials/rag-blueprint.md
index 60861f1787..7750c5e668 100644
--- a/en/learn/tutorials/rag-blueprint.md
+++ b/en/learn/tutorials/rag-blueprint.md
@@ -570,8 +570,7 @@ not the case for most real-world RAG applications, so this is cruical to have in
 
 ![phased ranking overview](/assets/img/phased-ranking-rag.png)
 
-It is worth noting that parameters such as `targetHits` (for the match phase) and `rerank-count` 
-(for first and second phase) are applied **per content node**. Also note that the stateless container nodes can 
+That the stateless container nodes can 
 also be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
 
 ## Configuring match-phase (retrieval)
@@ -1380,8 +1379,8 @@ We run the evaluation script on a set of unseen test queries, and get the follow
 ```
 
 For the first phase ranking, we care most about recall, as we just want to make sure that the candidate documents are 
-ranked high enough to be included in the second-phase ranking. (the default number of documents that will be exposed to 
-second-phase is 10 000, but can be controlled by the `rerank-count` parameter).
+ranked high enough to be included in the second-phase ranking. The number of documents to be reranked in second-phase
+in total over all content nodes is controlled by the `total-rerank-count` parameter.
 
 We can see that our results are already very good. This is of course due to the fact that we have a small,synthetic dataset. 
 In reality, you should align the metric expectations with your dataset and test queries.
@@ -1392,7 +1391,7 @@ within your latency budget, as you want some headroom for second-phase ranking.
 ## Second-phase ranking
 
 For the second-phase ranking, we can afford to use a more expensive ranking expression, since we will only run it 
-on the top-k documents from the first-phase ranking (defined by the `rerank-count` parameter, which defaults to 10,000 documents).
+on the top-k documents from the first-phase ranking (decided by the `total-rerank-count` parameter).
 
 This is where we can significantly improve ranking quality by using more sophisticated models and features that would 
 be too expensive to compute for all matched documents.
@@ -1589,7 +1588,7 @@ vespa query \
 **Performance monitoring:**
 
 * Monitor latency impact of second-phase ranking
-* Adjust `rerank-count` based on quality vs. performance trade-offs
+* Adjust `total-rerank-count` based on quality vs. performance trade-offs
 * Consider using different models for different query types or use cases
 
 The second-phase ranking represents a crucial step in building high-quality RAG applications, 
@@ -1598,7 +1597,7 @@ providing the precision needed for effective LLM context while maintaining reaso
 ## (Optional) Global-phase ranking
 
 We also have the option of configuring [global-phase](../../reference/schemas/schemas.html#globalphase-rank) ranking, which can rerank the top k 
-(as set by `rerank-count` parameter) documents from the second-phase ranking.
+(as set by `total-rerank-count` parameter) documents from the second-phase ranking.
 
 Common options for global-phase are [cross-encoders](../../ranking/cross-encoders.html) or another GBDT model, trained for 
 better separating top ranked documents on objectives such as [LambdaMart](https://xgboost.readthedocs.io/en/latest/tutorials/learning_to_rank.html). For RAG applications, 
diff --git a/en/querying/nearest-neighbor-search.md b/en/querying/nearest-neighbor-search.md
index 5178835275..570ff6ce39 100644
--- a/en/querying/nearest-neighbor-search.md
+++ b/en/querying/nearest-neighbor-search.md
@@ -275,7 +275,7 @@ rank-profile image_similarity_with_reranking {
         expression: closeness(field, image_embeddings)
     } 
     second-phase {
-        rerank-count: 1000
+        total-rerank-count: 1000
         expression: closeness(field, image_embeddings) * attribute(popularity)
     }
 }
diff --git a/en/ranking/phased-ranking.html b/en/ranking/phased-ranking.html
index 4779d385bb..7172006207 100644
--- a/en/ranking/phased-ranking.html
+++ b/en/ranking/phased-ranking.html
@@ -31,8 +31,8 @@
       <li><a href="#two-phase-ranking-on-content-nodes">second-phase ranking</a>;
       configured in <a href="../reference/schemas/schemas.html#rank-profile">rank-profile</a>.
       Optionally re-rank the top-scoring hits from the first-phase ranking using a more complex expression. The
-      <a href="../reference/schemas/schemas.html#secondphase-rerank-count">rerank-count</a> sets a strict upper bound on the
-      number of documents that are re-ranked.  
+      <a href="../reference/schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a> sets a strict upper bound on the
+      number of documents that are re-ranked in total over the nodes.
       </li>
     </ul>
   <li><strong>Global ranking:</strong>Following the per content node local ranking phases, 
@@ -83,8 +83,8 @@ <h2 id="two-phase-ranking-on-content-nodes">Two-phase ranking on content nodes</
 </p>
 <p>
   By default, second-phase ranking (if specified) is evaluated for the 100 best hits
-  from the first-phase ranking per content node, tunable with 
-  <a href="../reference/schemas/schemas.html#secondphase-rerank-count">rerank-count</a>.
+  from the first-phase ranking per content node. The number that is reranked over all nodes can be set by
+  <a href="../reference/schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>.
 </p>
 <pre>
 schema myapp {
@@ -99,7 +99,7 @@ <h2 id="two-phase-ranking-on-content-nodes">Two-phase ranking on content nodes</
             expression {
                 xgboost("my-model.json")
             }
-            rerank-count: 50
+            total-rerank-count: 50
         }
     }
 }
@@ -203,7 +203,7 @@ <h2 id="using-a-global-phase-expression">Using a global-phase expression</h2>
             }
             rerank-count: 50
         }
-        match-features {
+        match-features {
             my_expensive_function
         }
     }
diff --git a/en/reference/api/query.html b/en/reference/api/query.html
index 424565901f..d5364109db 100644
--- a/en/reference/api/query.html
+++ b/en/reference/api/query.html
@@ -80,7 +80,8 @@ <h2 id="parameters">Parameters</h2>
     <li><a href="#ranking.properties">ranking.properties</a> [<em>rankproperty</em>]</li>
     <li><a href="#ranking.querycache">ranking.queryCache</a></li>
     <li><a href="#ranking.rankscoredroplimit">ranking.rankScoreDropLimit</a></li>
-    <li><a href="#ranking.rerankcount">ranking.rerankCount</a></li>
+    <li><a href="#ranking.secondphase.totalrerankcount">ranking.secondPhase.totalRerankCount</a></li>
+    <li><a href="#ranking.secondphase.rerankcount">ranking.secondPhase.rerankCount</a></li>
     <li><a href="#ranking.secondphase.rankscoredroplimit">ranking.secondPhase.rankScoreDropLimit</a></li>
     <li><a href="#ranking.significance.useModel">ranking.significance.useModel</a></li>
     <li><a href="#ranking.softtimeout.enable">ranking.softtimeout.enable</a></li>
@@ -851,15 +852,28 @@ <h2 id="ranking">Ranking</h2>
     </td>
   </tr>
   <tr>
-    <th>ranking.rerankCount</th>
+    <th>ranking.secondPhase.totalRerankCount</th>
     <td></td>
     <td>Number</td>
     <td></td>
     <td>
-      <p id="ranking.rerankcount">
-        Specifies the number of hits that should be ranked in the second ranking phase.
-        Overrides the <a href="../schemas/schemas.html#secondphase-rerank-count">rerank-count</a> set in the rank profile.
-        Setting to 0 disables the second phase reranking.
+      <p id="ranking.secondphase.totalrerankcount">
+      Specifies the number of hits that should be ranked in the second ranking phase in total over the queried
+      content nodes.
+      Overrides the <a href="../schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a> set in the rank profile.
+      Setting to 0 disables second phase reranking.
+      </p>
+    </td>
+  </tr>
+  <tr>
+    <th>ranking.secondPhase.rerankCount</th>
+    <td></td>
+    <td>Number</td>
+    <td></td>
+    <td>
+      <p id="ranking.secondphase.rerankcount">
+        Specifies the number of hits that should be ranked in the second phase <i>per node</i>.
+        Prefer using <a href="#ranking.secondphase.totalrerankcount">totalRerankCount</a> over this.
       </p>
     </td>
   </tr>
diff --git a/en/reference/querying/yql.html b/en/reference/querying/yql.html
index f21ba50b62..ef96cbbff3 100644
--- a/en/reference/querying/yql.html
+++ b/en/reference/querying/yql.html
@@ -2196,9 +2196,9 @@ <h2 id="annotations">Annotations</h2>
       </p>
       <p>
         It sets the wanted number of hits exposed to the real first-phase ranking function per content node.
-        If additional second phase ranking with rerank-count is used,
+        If additional second phase ranking is used,
         do not set <code>targetHits</code> less than the configured rank-profile's
-        <a href="../schemas/schemas.html#secondphase-rerank-count">rerank-count</a>.
+        <a href="../schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>.
       </p>
     </td>
   </tr>
diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html
index 1fe8647152..e1fb4a0cc5 100644
--- a/en/reference/schemas/schemas.html
+++ b/en/reference/schemas/schemas.html
@@ -124,6 +124,7 @@ <h2 id="elements">Elements</h2>
         <a href="#secondphase-rank">second-phase</a>
             <a href="#expression">expression</a>
             <a href="#secondphase-rank-score-drop-limit">rank-score-drop-limit</a>
+            <a href="#secondphase-total-rerank-count">total-rerank-count</a>
             <a href="#secondphase-rerank-count">rerank-count</a>
         <a href="#globalphase-rank">global-phase</a>
             <a href="#expression">expression</a>
@@ -1861,7 +1862,7 @@ <h2 id="diversity">diversity</h2>
 Result sets are guaranteed to get at least <a href="#diversity-min-groups">min-groups</a>
 unique values from the <a href="#diversity-min-groups">diversity attribute</a> from this phase,
 but no more than max-hits. For <a href="#match-phase">match-phase</a> max-hits = <a href="#match-phase-max-hits">match-phase max-hits</a>.
-For <a href="#secondphase-rank">second-phase</a> max-hits = <a href="#secondphase-rerank-count">rerank-count</a>
+For <a href="#secondphase-rank">second-phase</a> max-hits = <a href="#secondphase-total-rerank-count">total-rerank-count</a>
 A document is considered a candidate if:
 <ul>
   <li>The query has not yet reached the <em>max-hits</em>
@@ -1894,7 +1895,7 @@ <h2 id="diversity">diversity</h2>
         Using this with <a href="#match-phase">match-phase</a>
         often means one can reduce <a href="#match-phase-max-hits">max-hits</a>.
         In <a href="#secondphase-rank">second-phase</a>
-        you might reduce <a href="#secondphase-rerank-count">rerank-count</a> and still good and diverse results.
+        you might reduce <a href="#secondphase-total-rerank-count">total-rerank-count</a> and still get good and diverse results.
         </p>
       </td></tr>
   </tbody>
@@ -2233,15 +2234,26 @@ <h2 id="secondphase-rank">second-phase</h2>
     </p>
   </td>
 </tr>
+<tr><td style="white-space: nowrap">total-rerank-count</td>
+  <td>
+    <p id="secondphase-total-rerank-count">
+      Optional argument. Specifies the number of hits to be re-ranked in the second phase in total over the content
+      nodes that participate in evaluating a query (a <i>group</i>).
+      The default value is 100 per node. This can also be
+      <a href="../api/query.html#ranking.secondphase.totalrerankcount">set in the query</a>.
+      Hits not reranked might be <a href="#secondphase-rescoring">re-scored</a>.
+    </p>
+  </td>
+</tr>
 <tr><td style="white-space: nowrap">rerank-count</td>
-<td>
-  <p id="secondphase-rerank-count">
-    Optional argument. Specifies the number of hits to be re-ranked in the second phase.
-    The default value is 100. This can also be <a href="../api/query.html#ranking.rerankcount">set in the query</a>.
-    Note that this value is local to each node involved in a query.
-    Hits not reranked might be <a href="#secondphase-rescoring">re-scored</a>.
-  </p>
-</td>
+  <td>
+    <p id="secondphase-rerank-count">
+    Optional argument. Specifies the number of hits to be re-ranked in the second phase on each content node.
+    This can also be <a href="../api/query.html#ranking.secondphase.rerankcount">set in the query</a>.
+    Prefer using <a href="#secondphase-total-rerank-count">total-rerank-count</a> over this.
+    </p>
+  </td>
+</tr>
 </tbody>
 </table>
 

From 081e1448c1ed154382d8d3e4865d7a0e0f0fc4e2 Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@gmail.com>
Date: Fri, 13 Mar 2026 14:29:24 +0100
Subject: [PATCH 2/9] Document total-keep-rank-count

---
 en/learn/faq.md                   |  2 +-
 en/ranking/ranking-intro.md       |  4 ++--
 en/reference/api/query.html       | 19 +++++++++++++++++--
 en/reference/schemas/schemas.html | 10 +++++++++-
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/en/learn/faq.md b/en/learn/faq.md
index b48239d797..9ede0766ef 100644
--- a/en/learn/faq.md
+++ b/en/learn/faq.md
@@ -87,7 +87,7 @@ of a double. This can happen in two cases:
 
 - The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce
 -Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this.
-- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside of what Vespa computed and caches on a heap. This is controlled by the [keep-rank-count](../reference/schemas/schemas.html#keep-rank-count).
+- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) perameter.
 
 ### How to pin query results?
 To hard-code documents to positions in the result set,
diff --git a/en/ranking/ranking-intro.md b/en/ranking/ranking-intro.md
index db6db93ad0..2d1da716eb 100644
--- a/en/ranking/ranking-intro.md
+++ b/en/ranking/ranking-intro.md
@@ -293,7 +293,7 @@ Let's try the same query again, with a two-phase rank-profile that also does an
 <pre>
 rank-profile inlinks_twophase inherits inlinks_age {
     first-phase {
-        keep-rank-count       : 50
+        total-keep-rank-count : 50
         rank-score-drop-limit : 10
         expression            : num_inlinks
     }
@@ -316,7 +316,7 @@ Here, `num_inlinks` and `rank_score` are defined in a rank profile we used earli
 
 In the results, observe that no document has a _rankingExpression(num_inlinks)_ less than or equal to 10.0,
 meaning all such documents were purged in the first ranking phase due to the `rank-score-drop-limit`.
-Normally, the `rank-score-drop-limit` is not used, as the `keep-rank-count` is most important.
+Normally, the `rank-score-drop-limit` is not used, as the `total-keep-rank-count` is most important.
 Read more in the [reference](../reference/schemas/schemas.html#rank-score-drop-limit).
 
 For a dynamic limit, pass a ranking feature like `query(threshold)`
diff --git a/en/reference/api/query.html b/en/reference/api/query.html
index d5364109db..6de4ee22e9 100644
--- a/en/reference/api/query.html
+++ b/en/reference/api/query.html
@@ -877,6 +877,19 @@ <h2 id="ranking">Ranking</h2>
       </p>
     </td>
   </tr>
+  <tr>
+    <th>ranking.totalKeepRankCount</th>
+    <td></td>
+    <td>Number</td>
+    <td></td>
+    <td>
+      <p id="ranking.totalkeeprankcount">
+        Specifies the number of hits for which the rank score should be kept after first phase ranking
+        in total over the nodes participating in the query.
+        Overrides the <a href="../schemas/schemas.html#total-keep-rank-count">total-keep-rank-count</a> set in the rank profile.
+      </p>
+    </td>
+  </tr>
   <tr>
     <th>ranking.keepRankCount</th>
     <td></td>
@@ -884,8 +897,10 @@ <h2 id="ranking">Ranking</h2>
     <td></td>
     <td>
       <p id="ranking.keeprankcount">
-        Specifies the number of hits that should keep rank value.
-        Overrides the <a href="../schemas/schemas.html#keep-rank-count">keep-rank-count</a> set in the rank profile.
+      Specifies the number of hits for which the rank score should be kept after first phase ranking
+      on each node.
+      Overrides the <a href="../schemas/schemas.html#keep-rank-count">keep-rank-count</a> set in the rank profile.
+      Prefer <a href=#ranking.totalkeeprankcount">total-keep-rank-count</a> over this.
       </p>
     </td>
   </tr>
diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html
index e1fb4a0cc5..68b301597a 100644
--- a/en/reference/schemas/schemas.html
+++ b/en/reference/schemas/schemas.html
@@ -118,6 +118,7 @@ <h2 id="elements">Elements</h2>
             <a href="#match-phase-order">order</a>
             <a href="#match-phase-max-hits">max-hits</a>
         <a href="#firstphase-rank">first-phase</a>
+            <a href="#total-keep-rank-count">total-keep-rank-count</a>
             <a href="#keep-rank-count">keep-rank-count</a>
             <a href="#rank-score-drop-limit">rank-score-drop-limit</a>
             <a href="#expression">expression</a>
@@ -1929,9 +1930,16 @@ <h2 id="firstphase-rank">first-phase</h2>
     see <a href="../ranking/ranking-expressions.html">ranking expressions</a>.</p>
   </td>
 </tr>
+<tr><td>total-keep-rank-count</td>
+  <td>
+    <p id="total-keep-rank-count">How many documents to keep the first phase top rank values for.
+    The default value is 10000 per node.</p>
+  </td>
+</tr>
 <tr><td>keep-rank-count</td>
   <td>
-    <p id="keep-rank-count">How many documents to keep the first phase top rank values for. The default value is 10000.</p>
+    <p id="keep-rank-count">How many documents to keep the first phase top rank values for.
+    Prefer <a href="#total-keep-rank-count">total-keep-rank-count</a> over this.</p>
   </td>
 </tr>
 <tr><td style="white-space: nowrap">rank-score-drop-limit</td>

From 4c5fa13a1eb0e43ceab8110d801f6c6049ea0119 Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@gmail.com>
Date: Mon, 16 Mar 2026 13:11:50 +0100
Subject: [PATCH 3/9] Document total-max-hits

---
 en/performance/graceful-degradation.html | 14 +++++------
 en/querying/result-diversity.md          |  2 +-
 en/reference/api/query.html              | 30 +++++++++++++++++++-----
 en/reference/schemas/schemas.html        | 23 +++++++++++++-----
 4 files changed, 49 insertions(+), 20 deletions(-)

diff --git a/en/performance/graceful-degradation.html b/en/performance/graceful-degradation.html
index 853a069daa..f0559814cf 100644
--- a/en/performance/graceful-degradation.html
+++ b/en/performance/graceful-degradation.html
@@ -177,21 +177,21 @@ <h2 id="match-phase-degradation">Match phase degradation</h2>
 <p>
 Match-phase works by specifying an <code>attribute</code> that measures document
 quality in some way (popularity, click-through rate, pagerank, ad bid value, price, text quality).
-In addition, a <code>max-hits</code> value is specified
-that specifies how many hits are "more than enough" for the application.
+In addition, a <code>total.max-hits</code> value is specified
+that specifies how many hits in total over the content nodes are "more than enough" for the application.
 Then an estimate is made after collecting a reasonable amount of hits for the query,
-and if the estimate is higher than the configured <code>max-hits</code> value,
+and if the estimate is higher than the node's share of the <code>total-max-hits</code> value,
 an extra limitation is added to the query,
 ensuring that only the highest quality documents can become hits.
 </p><p>
 In effect, this limits the documents actually queried to the highest quality documents,
 a subset of the full corpus,
 where the size of subset is calculated in such a way
-that the query is estimated to give <code>max-hits</code> hits.
+that the query is estimated to give the node's share of <code>total-max-hits</code> hits.
 Since some (low-quality) hits will already have been collected to do the estimation,
-the actual number of hits returned will usually be higher than max-hits.
+the actual number of hits returned will usually be higher than total-max-hits.
 But since the distribution of documents isn't perfectly smooth,
-you risk sometimes getting less than the configured <code>max-hits</code> hits back.
+you risk sometimes getting less than the configured <code>total-max-hits</code> hits back.
 </p><p>
 Note that limiting hits in the match-phase also affects <a href="../querying/grouping.html">aggregation/grouping</a>,
 and total-hit-count since it actually limits, so the query gets fewer hits.
@@ -200,7 +200,7 @@ <h2 id="match-phase-degradation">Match phase degradation</h2>
 since they both operate in the same manner,
 and you would get interference between them that could cause unpredictable results.
 The graph shows possible hits versus actual hits in a corpus with 100 000 documents,
-where <code>max-hits</code> is configured to 10 000.
+where <code>total-max-hits</code> is configured to 10 000 per node.
 The corpus is a synthetic (slightly randomized) data set,
 in practice the graph will be less smooth:
 </p>
diff --git a/en/querying/result-diversity.md b/en/querying/result-diversity.md
index 4f43124205..209608f722 100644
--- a/en/querying/result-diversity.md
+++ b/en/querying/result-diversity.md
@@ -101,7 +101,7 @@ rank-profile diverse_example {
 
     match-phase {
         attribute: popularity
-        max-hits: 100
+        total-max-hits: 1000
         max-filter-coverage: 1.0
     }
 
diff --git a/en/reference/api/query.html b/en/reference/api/query.html
index 6de4ee22e9..70c3e5b0f5 100644
--- a/en/reference/api/query.html
+++ b/en/reference/api/query.html
@@ -70,8 +70,6 @@ <h2 id="parameters">Parameters</h2>
     <li><a href="#ranking.elementGap">ranking.elementGap.<em>fieldName</em></a></li>
     <li><a href="#ranking.features">ranking.features</a> [<em>input</em>, <em>rankfeature</em>]</li>
     <li><a href="#ranking.freshness">ranking.freshness</a></li>
-    <li><a href="#ranking.globalphase.rankscoredroplimit">ranking.globalPhase.rankScoreDropLimit</a></li>
-    <li><a href="#ranking.globalphase.rerankcount">ranking.globalPhase.rerankCount</a></li>
     <li><a href="#ranking.keeprankcount">ranking.keepRankCount</a></li>
     <li><a href="#ranking.listfeatures">ranking.listFeatures</a> [<em>rankfeatures</em>]</li>
     <li><a href="#ranking.matchPhase">ranking.matchPhase</a></li>
@@ -80,9 +78,17 @@ <h2 id="parameters">Parameters</h2>
     <li><a href="#ranking.properties">ranking.properties</a> [<em>rankproperty</em>]</li>
     <li><a href="#ranking.querycache">ranking.queryCache</a></li>
     <li><a href="#ranking.rankscoredroplimit">ranking.rankScoreDropLimit</a></li>
+    <li><a href="#ranking.matchphase.attribute">ranking.matchphase.attribute</a></li>
+    <li><a href="#ranking.matchphase.totalmaxhits">ranking.matchPhase.totalMaxHits</a></li>
+    <li><a href="#ranking.matchphase.maxhits">ranking.matchPhase.maxHits</a></li>
+    <li><a href="#ranking.matchphase.ascending">ranking.matchPhase.ascending</a></li>
+    <li><a href="#ranking.matchphase.diversity.attribute">matchPhase.diversity.attribute</a></li>
+    <li><a href="#ranking.matchphase.diversity.mingroups">matchPhase.diversity.minGroups</a></li>
     <li><a href="#ranking.secondphase.totalrerankcount">ranking.secondPhase.totalRerankCount</a></li>
     <li><a href="#ranking.secondphase.rerankcount">ranking.secondPhase.rerankCount</a></li>
     <li><a href="#ranking.secondphase.rankscoredroplimit">ranking.secondPhase.rankScoreDropLimit</a></li>
+    <li><a href="#ranking.globalphase.rankscoredroplimit">ranking.globalPhase.rankScoreDropLimit</a></li>
+    <li><a href="#ranking.globalphase.rerankcount">ranking.globalPhase.rerankCount</a></li>
     <li><a href="#ranking.significance.useModel">ranking.significance.useModel</a></li>
     <li><a href="#ranking.softtimeout.enable">ranking.softtimeout.enable</a></li>
     <li><a href="#ranking.sorting">ranking.sorting</a> [<em>sorting</em>]</li>
@@ -1256,6 +1262,19 @@ <h2 id="ranking.matchPhase">ranking.matchPhase</h2>
       <p>The attribute used to limit matches by if more than maxHits hits will be produced.</p>
     </td>
   </tr>
+  <tr>
+    <th>ranking.matchPhase<br/>.totalMaxHits</th>
+    <td></td>
+    <td>long</td>
+    <td></td>
+    <td>
+      <p id="ranking.matchphase.totalmaxhits">
+        The max number of hits that should be generated in total over the content nodes during the match phase.
+        Setting the value to `0` disables match phase early termination.
+        Rank profile equivalent: <a href="../schemas/schemas.html#match-phase-total-max-hits">match-phase: total-max-hits</a>
+      </p>
+    </td>
+  </tr>
   <tr>
     <th>ranking.matchPhase<br/>.maxHits</th>
     <td></td>
@@ -1263,10 +1282,9 @@ <h2 id="ranking.matchPhase">ranking.matchPhase</h2>
     <td></td>
     <td>
       <p id="ranking.matchphase.maxhits">
-        Rank profile equivalent: <a href="../schemas/schemas.html#match-phase-max-hits">match-phase: max-hits</a>
-      </p>
-      <p>The max number of hits that should be generated on each content node during the match phase.</p>
-      <p>Setting the value to `0` disables the match phase early termination.</p>
+      The max number of hits that should be generated on eache content nodes during the match phase.
+      Prefer using <a href="#ranking.matchphase.totalmaxhits">totalMaxHits</a> over this.
+      Rank profile equivalent: <a href="../schemas/schemas.html#match-phase-max-hits">match-phase: max-hits</a>
     </td>
   </tr>
   <tr>
diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html
index 68b301597a..8d5e044cc5 100644
--- a/en/reference/schemas/schemas.html
+++ b/en/reference/schemas/schemas.html
@@ -116,6 +116,7 @@ <h2 id="elements">Elements</h2>
         <a href="#match-phase">match-phase</a>
             <a href="#match-phase-attribute">attribute</a>
             <a href="#match-phase-order">order</a>
+            <a href="#match-phase-total-max-hits">total-max-hits</a>
             <a href="#match-phase-max-hits">max-hits</a>
         <a href="#firstphase-rank">first-phase</a>
             <a href="#total-keep-rank-count">total-keep-rank-count</a>
@@ -1798,7 +1799,7 @@ <h2 id="match-phase">match-phase</h2>
 match-phase {
     attribute: [numeric single value attribute]
     order: [ascending | descending]
-    max-hits: [integer]
+    total-max-hits: [integer]
 }
 </pre>
 <table class="table">
@@ -1809,7 +1810,8 @@ <h2 id="match-phase">match-phase</h2>
   <td>
     <p id="match-phase-attribute">
       The quality attribute that decides which documents are a match if the match phase
-      estimates that there will be more than <a href="#match-phase-max-hits">max-hits</a> hits.
+      estimates that there will be more than the node's share if
+      <a href="#match-phase-total-max-hits">total-max-hits</a> hits.
       The attribute must be single-value numeric with <a href="#attribute">fast-search</a> enabled.
       It should correlate with the order which would be produced by a full query evaluation.
       No default.
@@ -1824,11 +1826,19 @@ <h2 id="match-phase">match-phase</h2>
       as the default value <code>descending</code> is by far the most common.
     </p>
   </td></tr>
+  <tr><td style="white-space: nowrap">total-max-hits</td>
+    <td>
+      <p id="match-phase-total-max-hits">
+      The total max hits that should be produced in the match phase across all nodes
+      in the group evaluating the query.
+      This number should be large, and larger the worse the correlation between the
+      match-phase attribute and the first-phase function.</p>
+    </td></tr>
   <tr><td style="white-space: nowrap">max-hits</td>
     <td>
       <p id="match-phase-max-hits">
         The max hits each content node should attempt to produce in the match phase.
-        Usually, a number like 10000 works well here.</p>
+        Prefer using <a href="#match-phase-total-max-hits">total-max-hits</a> over this.
     </td></tr>
 </tbody>
 </table>
@@ -1862,8 +1872,9 @@ <h2 id="diversity">diversity</h2>
 Specify the name of an attribute that will be used to provide diversity.
 Result sets are guaranteed to get at least <a href="#diversity-min-groups">min-groups</a>
 unique values from the <a href="#diversity-min-groups">diversity attribute</a> from this phase,
-but no more than max-hits. For <a href="#match-phase">match-phase</a> max-hits = <a href="#match-phase-max-hits">match-phase max-hits</a>.
-For <a href="#secondphase-rank">second-phase</a> max-hits = <a href="#secondphase-total-rerank-count">total-rerank-count</a>
+but no more than max-hits.
+For <a href="#match-phase">match-phase</a> max-hits = the node's share of <a href="#match-phase-max-hits">match-phase total-max-hits</a>.
+For <a href="#secondphase-rank">second-phase</a> max-hits = the node's share of <a href="#secondphase-total-rerank-count">total-rerank-count</a>
 A document is considered a candidate if:
 <ul>
   <li>The query has not yet reached the <em>max-hits</em>
@@ -1894,7 +1905,7 @@ <h2 id="diversity">diversity</h2>
         <p id="diversity-min-groups">
         Specifies the minimum number of groups returned from the phase.
         Using this with <a href="#match-phase">match-phase</a>
-        often means one can reduce <a href="#match-phase-max-hits">max-hits</a>.
+        often means one can reduce <a href="#match-phase-total-max-hits">total-max-hits</a>.
         In <a href="#secondphase-rank">second-phase</a>
         you might reduce <a href="#secondphase-total-rerank-count">total-rerank-count</a> and still get good and diverse results.
         </p>

From 3b3f076a7e736c181fa7413e8d2de29ac8511403 Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@gmail.com>
Date: Mon, 16 Mar 2026 14:07:17 +0100
Subject: [PATCH 4/9] Document totalTargetHits

---
 en/clients/vespa-cli.html                     |  2 +-
 .../practical-search-performance-guide.md     |  2 +-
 en/querying/approximate-nn-hnsw.md            | 21 ++---
 en/querying/nearest-neighbor-search-guide.md  | 74 +++++++++---------
 en/querying/nearest-neighbor-search.md        |  2 +-
 en/rag/binarizing-vectors.md                  | 12 +--
 en/rag/embedding.html                         |  4 +-
 en/rag/working-with-chunks.html               |  2 +-
 en/ranking/ranking-intro.md                   |  2 +-
 en/ranking/wand.html                          |  8 +-
 en/reference/api/query.html                   |  2 +-
 en/reference/querying/json-query-language.md  |  8 +-
 en/reference/querying/yql.html                | 78 +++++++++++++------
 en/reference/schemas/schemas.html             |  8 +-
 14 files changed, 127 insertions(+), 98 deletions(-)

diff --git a/en/clients/vespa-cli.html b/en/clients/vespa-cli.html
index 196ec173fc..94b1289c69 100644
--- a/en/clients/vespa-cli.html
+++ b/en/clients/vespa-cli.html
@@ -233,7 +233,7 @@ <h3 id="queries">Queries</h3>
 <p>Example query file:</p>
 <pre>{% highlight json %}
 {
-    "yql": "select product_id, title from products where {targetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
+    "yql": "select product_id, title from products where {totalTargetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
     "input.query(q_vector)": [-0.050548091530799866, ... ,0.028366032987833023],
     "ranking": "vector_distance"
 }
diff --git a/en/performance/practical-search-performance-guide.md b/en/performance/practical-search-performance-guide.md
index 3488a67a36..c95ca052ff 100644
--- a/en/performance/practical-search-performance-guide.md
+++ b/en/performance/practical-search-performance-guide.md
@@ -1122,7 +1122,7 @@ Repeating the query from above, replacing `dotProduct` with `wand`:
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains="Vastarannan valssi">
 $ vespa query \
-    'yql=select track_id, title, artist, tags from track where {targetHits:10}wand(tags, @userProfile)' \
+    'yql=select track_id, title, artist, tags from track where {totalTargetHits:10}wand(tags, @userProfile)' \
     'userProfile={"hard rock":1, "rock":1,"metal":1, "finnish metal":1}' \
     'hits=1' \
     'ranking=personalized'
diff --git a/en/querying/approximate-nn-hnsw.md b/en/querying/approximate-nn-hnsw.md
index 192240d8cd..edd162bd20 100644
--- a/en/querying/approximate-nn-hnsw.md
+++ b/en/querying/approximate-nn-hnsw.md
@@ -134,7 +134,7 @@ or exact (brute-force) search by using the [approximate query annotation](../ref
 
 <pre>
 {
-  "yql": "select * from doc where {targetHits: 100, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
+  "yql": "select * from doc where {totalTargetHits: 10, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
   "hits": 10
   "input.query(query_image_embedding)": [0.21,0.12,....],
   "ranking.profile": "image_similarity" 
@@ -150,9 +150,9 @@ Note that exact searches over a large vector volume require adjustment of the
 The default [query timeout](../reference/api/query.html#timeout) is 500ms,
 which will be too low for an exact search over many vectors.
 
-In addition to [targetHits](../reference/querying/yql.html#targethits), 
+In addition to [totalTargetHits](../reference/querying/yql.html#totaltargethits), 
 there is a [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) parameter
-which controls how many extra nodes in the graph (in addition to `targetHits`)
+which controls how many extra nodes in the graph (in addition to `totalTargetHits`)
 that are explored during the graph search. This parameter is used to tune accuracy quality versus query performance. 
 
 ## Combining approximate nearest neighbor search with filters 
@@ -174,22 +174,23 @@ Note that when using `pre-filtering` the following query operators are not inclu
 * [predicate](../reference/querying/yql.html#predicate)
 
 These are instead evaluated after the approximate nearest neighbors are retrieved, more like a `post-filter`.
-This might cause the search to expose fewer hits to ranking than the wanted `targetHits`.
+This might cause the search to expose fewer hits to ranking than the wanted `totalTargetHits`.
 
 Since {% include version.html version="8.78" %} the `pre-filter` can be evaluated using
 [multiple threads per query](../performance/practical-search-performance-guide.html#multithreaded-search-and-ranking).
 This can be used to reduce query latency for larger vector datasets where the cost of evaluating the `pre-filter` is significant.
 Note that searching the `HNSW` index is always single-threaded per query.
 Multithreaded evaluation when using `post-filtering` has always been supported,
-but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `targetHits`.
+but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `totalTargetHits`.
 
 ## Nearest Neighbor Search Considerations
 
-* **targetHits**:
-The [targetHits](../reference/querying/yql.html#targethits)
-specifies how many hits one wants to expose to [ranking](../basics/ranking.html) *per content node*.
-Approximate search exposes exactly `targetHits` hits to `first-phase` ranking on every content node
-as long as `targetHits` hits are actually found and not filtered out afterwards.
+* **totalTargetHits**:
+The [totalTargetHits](../reference/querying/yql.html#totaltargethits) parameter
+specifies how many hits one wants to expose to [ranking](../basics/ranking.html) in total over the content nodes
+participating in the query (you can also set this per node using [targetHits](../reference/querying/yql.html#targethits)).
+Approximate search exposes exactly `totalTargetHits` hits to `first-phase` ranking over the content nodes
+as long as `totalTargetHits` hits are actually found and not filtered out.
 Nearest neighbor search is typically used as an efficient retriever in a [phased ranking](../ranking/phased-ranking.html)
 pipeline. See [performance sizing](../performance/sizing-search.html). 
 
diff --git a/en/querying/nearest-neighbor-search-guide.md b/en/querying/nearest-neighbor-search-guide.md
index 3c315ed071..b1d69b5bb0 100644
--- a/en/querying/nearest-neighbor-search-guide.md
+++ b/en/querying/nearest-neighbor-search-guide.md
@@ -745,7 +745,7 @@ performing a maximum inner product search over the `tags` weightedset field.
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains="The Rose">
 $ vespa query \
-    'yql=select track_id, title, artist from track where {targetHits:10}wand(tags, @userProfile)' \
+    'yql=select track_id, title, artist from track where {totalTargetHits:10}wand(tags, @userProfile)' \
     'userProfile={"pop":1, "love songs":1,"romantic":10, "80s":20 }' \
     'hits=2' \
     'ranking=tags'
@@ -822,7 +822,7 @@ and Vespa embed functionality:
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains="Bonnie Tyler">
 $ vespa query \
-    'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
+    'yql=select title, artist from track where {approximate:false,totalTargetHits:10}nearestNeighbor(embedding,q)' \
     'hits=1' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -831,13 +831,13 @@ $ vespa query \
 
 Query breakdown:
 
-- Search for ten (`targetHits:10`) nearest neighbors of the `query(q)` query tensor over the `embedding`
+- Search for a ten (`totalTargetHits:10`) nearest neighbors of the `query(q)` query tensor over the `embedding`
 document tensor field. 
 - The annotation `approximate:false` tells Vespa to perform exact search.
 - The `hits` parameter controls how many results are returned in the response. Number of `hits`
-requested does not impact `targetHits`. Notice that `targetHits` is per content node involved in the query. 
+requested does not impact `totalTargetHits`. 
 - `ranking=closeness` tells Vespa which [rank-profile](../basics/ranking.html) to score documents. One must 
-specify how to *rank* the `targetHits` documents retrieved and exposed to `first-phase` ranking expression
+specify how to *rank* the `totalTargetHits` documents retrieved and exposed to `first-phase` ranking expression
 in the `rank-profile`.
 - `input.query(q)` is the query vector produced by the [embedder](../rag/embedding.html#embedding-a-query-text).
 
@@ -898,7 +898,7 @@ Changing the rank-profile to `closeness-t4` makes Vespa use four threads per que
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains="Bonnie Tyler">
 $ vespa query \
-    'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
+    'yql=select title, artist from track where {approximate:false,totalTargetHits:10}nearestNeighbor(embedding,q)' \
     'hits=1' \
     'ranking=closeness-t4' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -932,7 +932,7 @@ field has `index`:
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec"  data-test-assert-contains="Bonnie Tyler">
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:10,hnsw.exploreAdditionalHits:20}nearestNeighbor(embedding,q)' \
+    'yql=select title, artist from track where {totalTargetHits:10,hnsw.exploreAdditionalHits:20}nearestNeighbor(embedding,q)' \
     'hits=1' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -1010,7 +1010,7 @@ In this query example the `title` field must contain the term `heart`:
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains="of the Heart">
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains "heart"' \
+    'yql=select title, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and title contains "heart"' \
     'hits=2' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -1109,7 +1109,7 @@ the matching against the `title` field can use the most efficient posting list r
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains="of the Heart">
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains ({ranked:false}"heart")' \
+    'yql=select title, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and title contains ({ranked:false}"heart")' \
     'hits=2' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -1127,7 +1127,7 @@ with any other Vespa query operator.
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='"popularity": 100'>
 $ vespa query \
-    'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 20 and artist contains "Bonnie Tyler"' \
+    'yql=select title, popularity, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and popularity > 20 and artist contains "Bonnie Tyler"' \
     'hits=2' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -1140,8 +1140,8 @@ This query example restricts the search to tracks by `Bonnie Tyler` with `popula
 When combining nearest neighbor search with strict filters that match less than 2 percent of the total number of documents,
 Vespa will instead of searching the HNSW graph, constrained by the filter, fall back to using exact nearest neighbor search.
 See [Controlling filter behavior](#controlling-filter-behavior) for how to adjust the threshold for which strategy that is used.
-Since exact search may expose more than `targetHits` hits to the `first-phase` ranking expression,
-users will observe that `totalCount` increases and is higher than `targetHits` when falling back to exact search.
+Since exact search may expose more than `totalTargetHits` hits to the `first-phase` ranking expression,
+users will observe that `totalCount` increases and is higher than `totalTargetHits` when falling back to exact search.
 This can be seen in the previous examples.
 When using exact search with filters, the search can also use multiple threads to evaluate the query, which
 helps reduce the latency impact.
@@ -1170,7 +1170,7 @@ The following query with a restrictive filter on popularity is used for illustra
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
+    'yql=select title, popularity, artist from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
     'hits=2' \
     'ranking=closeness-t4' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -1239,7 +1239,7 @@ because it's `distance(field, embedding)` is close to 0.5.
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='"totalCount": 1'>
 $ vespa query \
-    'yql=select title, popularity, artist from track where {distanceThreshold:0.2,targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
+    'yql=select title, popularity, artist from track where {distanceThreshold:0.2,totalTargetHits:00}nearestNeighbor(embedding,q) and popularity > 80' \
     'hits=2' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
@@ -1312,7 +1312,7 @@ both based on semantic (vector distance) and traditional sparse (exact) matching
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
+    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
     'query=total eclipse of the heart' \
     'type=weakAnd' \
     'hits=2' \
@@ -1428,7 +1428,7 @@ In the below query, we lower the weight of the popularity factor by adjusting `q
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
+    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
     'query=total eclipse of the heart' \
     'type=weakAnd' \
     'hits=2' \
@@ -1513,7 +1513,7 @@ Which can be used with the `wand` query operator to retrieve personalized hits f
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Straight From The Heart'>
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery() or ({targetHits:10}wand(tags, @userProfile))' \
+    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) or userQuery() or ({totalTargetHits:10}wand(tags, @userProfile))' \
     'query=total eclipse of the heart' \
     'type=weakAnd' \
     'hits=2' \
@@ -1597,7 +1597,7 @@ the query terms in the `weakAnd`.
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery()' \
+    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) and userQuery()' \
     'query=total eclipse of the heart' \
     'type=weakAnd' \
     'hits=2' \
@@ -1613,7 +1613,7 @@ It is also possible to combine hybrid search with filters, this filters both the
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse'>
 $ vespa query \
-    'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery() and popularity < 75' \
+    'yql=select title, artist from track where {totalTargetHits:100}nearestNeighbor(embedding,q) and userQuery() and popularity < 75' \
     'query=total eclipse of the heart' \
     'type=weakAnd' \
     'hits=2' \
@@ -1631,7 +1631,7 @@ rank features for those hits retrieved by the first operand.
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title, artist from track where rank({targetHits:100}nearestNeighbor(embedding,q), userQuery())' \
+    'yql=select title, artist from track where rank({totalTargetHits:100}nearestNeighbor(embedding,q), userQuery())' \
     'query=total eclipse of the heart' \
     'type=weakAnd' \
     'hits=2' \
@@ -1714,7 +1714,7 @@ retrieved by the sparse query representation.
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title, artist from track where rank(userQuery(),{targetHits:100}nearestNeighbor(embedding,q))' \
+    'yql=select title, artist from track where rank(userQuery(),{totalTargetHits:100}nearestNeighbor(embedding,q))' \
     'query=total eclipse of the heart' \
     'type=weakAnd' \
     'hits=2' \
@@ -1735,7 +1735,7 @@ One can also use the `rank` operator to first retrieve by some filter logic, and
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title, popularity, artist from track where rank(popularity>99,{targetHits:10}nearestNeighbor(embedding,q))' \
+    'yql=select title, popularity, artist from track where rank(popularity>99,{totalTargetHits:10}nearestNeighbor(embedding,q))' \
     'hits=2' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")' 
@@ -1758,7 +1758,7 @@ query tensor inputs:
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title from track where ({targetHits:10}nearestNeighbor(embedding,q)) or ({targetHits:10}nearestNeighbor(embedding,q1))' \
+    'yql=select title from track where ({totalTargetHits:10}nearestNeighbor(embedding,q)) or ({totalTargetHits:10}nearestNeighbor(embedding,q1))' \
     'hits=2' \
     'ranking=closeness' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'  \
@@ -1835,7 +1835,7 @@ rank-profile closeness-label inherits closeness {
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Total Eclipse Of The Heart'>
 $ vespa query \
-    'yql=select title from track where ({ label:"q", targetHits:10}nearestNeighbor(embedding,q)) or ({label:"q1",targetHits:10}nearestNeighbor(embedding,q1))' \
+    'yql=select title from track where ({ label:"q", totalTargetHits:10}nearestNeighbor(embedding,q)) or ({label:"q1",totalTargetHits:10}nearestNeighbor(embedding,q1))' \
     'hits=2' \
     'ranking=closeness-label' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'  \
@@ -1897,14 +1897,14 @@ The above query annotates the two `nearestNeighbor` query operators using
 }{% endhighlight %}</pre>
 
 Note that the previous examples used `or` to combine the two operators. Using `and` instead, requires 
-that there are documents that is in both the top-k results. Increasing `targetHits` to 500,  
+that there are documents that is in both the top-k results. Increasing `totalTargetHits` to 500,  
 finds a few tracks that overlap. 
 
 <div class="pre-parent">
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='matchfeatures'>
 $ vespa query \
-    'yql=select title from track where ({label:"q", targetHits:500}nearestNeighbor(embedding,q)) and ({label:"q1",targetHits:500}nearestNeighbor(embedding,q1))' \
+    'yql=select title from track where ({label:"q", totalTargetHits:500}nearestNeighbor(embedding,q)) and ({label:"q1",totalTargetHits:500}nearestNeighbor(embedding,q1))' \
     'hits=2' \
     'ranking=closeness-label' \
     'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'  \
@@ -2019,7 +2019,7 @@ do not perform post-filtering, use *pre-filtering* strategy:
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='"totalCount": 10'>
 $ vespa query \
-  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
+  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
   'hits=2' \
   'ranking=closeness' \
   'ranking.matching.postFilterThreshold=1.0' \
@@ -2028,14 +2028,14 @@ $ vespa query \
 </pre>
 </div>
 
-The query exposes `targetHits` to ranking as seen from the `totalCount`. Now, repeating the query, but
+The query exposes `totalTargetHits` to ranking as seen from the `totalCount`. Now, repeating the query, but
 forcing *post-filtering* instead by setting *ranking.matching.postFilterThreshold=0.0*:
 
 <div class="pre-parent">
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='"totalCount": 1'>
 $ vespa query \
-  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
+  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
   'hits=2' \
   'ranking=closeness' \
   'ranking.matching.postFilterThreshold=0.0' \
@@ -2045,21 +2045,21 @@ $ vespa query \
 </div>
 
 In this case, Vespa will estimate how many documents the filter matches and auto-adjust `targethits` internally to a
-higher number, attempting to expose the `targetHits` to first phase ranking:
+higher number, attempting to expose the `totalTargetHits` to first phase ranking:
 
 The query exposes 16 documents to ranking as can be seen from `totalCount`. There are `8420` documents in the collection
 that are tagged with the `rock` tag, so roughly 8%. 
 
-Auto adjusting `targetHits` upwards for post-filtering is not always what you want, because it is slower than just retrieving
+Auto adjusting `totalTargetHits` upwards for post-filtering is not always what you want, because it is slower than just retrieving
 from the HNSW index without constraints. We can change the 
-`targetHits` adjustment factor with the [ranking.matching.targetHitsMaxAdjustmentFactor](../reference/api/query.html#ranking.matching) parameter.
-In this case, we set it to 1, which disables adjusting the `targetHits` upwards. 
+`totalTargetHits` adjustment factor with the [ranking.matching.targetHitsMaxAdjustmentFactor](../reference/api/query.html#ranking.matching) parameter.
+In this case, we set it to 1, which disables adjusting the `totalTargetHits` upwards. 
 
 <div class="pre-parent">
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='totalCount":'>
 $ vespa query \
-  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
+  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
   'hits=2' \
   'ranking=closeness' \
   'ranking.matching.postFilterThreshold=0.0' \
@@ -2068,7 +2068,7 @@ $ vespa query \
   'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
 </pre>
 </div>
-Since we are post-filtering without upward adjusting the targetHits, we end up with fewer hits. 
+Since we are post-filtering without upward adjusting totalTargetHits, we end up with fewer hits. 
 
 Changing the query to limit to a tag which is less frequent, for example, `90s`, which
 matches 1,695 documents or roughly 1.7%, will cause Vespa to fall back to exact search as the estimated filter hit count
@@ -2078,7 +2078,7 @@ is less than the `approximateThreshold`.
   <button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
 <pre data-test="exec" data-test-assert-contains='Bonnie Tyler'>
 $ vespa query \
-  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "90s"' \
+  'yql=select title, artist, tags from track where {totalTargetHits:10}nearestNeighbor(embedding,q) and tags contains "90s"' \
   'hits=2' \
   'ranking=closeness' \
   'ranking.matching.postFilterThreshold=0.0' \
@@ -2087,7 +2087,7 @@ $ vespa query \
 </pre>
 </div>
 
-The fallback to exact search will expose more than `targetHits` documents to ranking. 
+The fallback to exact search will expose more than `totalTargetHits` documents to ranking. 
 Read more about combining filters with nearest neighbor search in 
 the [Query Time Constrained Approximate Nearest Neighbor Search](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/) 
 blog post. 
diff --git a/en/querying/nearest-neighbor-search.md b/en/querying/nearest-neighbor-search.md
index 570ff6ce39..941a7fea30 100644
--- a/en/querying/nearest-neighbor-search.md
+++ b/en/querying/nearest-neighbor-search.md
@@ -376,7 +376,7 @@ using the [Query API](query-api.html#http):
 
 ```json
 {
-    "yql": "select * from product where {targetHits: 100}nearestNeighbor(image_embeddings, image_query_embedding) and in_stock = true",
+    "yql": "select * from product where {totalTargetHits: 100}nearestNeighbor(image_embeddings, image_query_embedding) and in_stock = true",
     "input.query(image_query_embedding)": [
         0.22507139604882176,
         0.11696498718517367,
diff --git a/en/rag/binarizing-vectors.md b/en/rag/binarizing-vectors.md
index 9dce96e286..a215e8d2d1 100644
--- a/en/rag/binarizing-vectors.md
+++ b/en/rag/binarizing-vectors.md
@@ -301,7 +301,7 @@ Assuming a query using the doc_embedding field:
 
 ```
 $ vespa query \
-    'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \
+    'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding, q)' \
     'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \
     'ranking=app_ranking'
 ```
@@ -310,7 +310,7 @@ The same query, with a binarized query vector, to the binarized field:
 
 ```
 $ vespa query \
-    'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
+    'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
     'input.query(q_bin)=[-119]' \
     'ranking=app_ranking_bin'
 ```
@@ -370,7 +370,7 @@ rank-profile app_ranking {
 Query:
 ```
 $ vespa query \
-    'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \
+    'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding, q)' \
     'input.query(q)=[2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]' \
     'ranking=app_ranking'
 ```
@@ -397,7 +397,7 @@ Query:
 
 ```
 $ vespa query \
-    'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
+    'yql=select * from doc where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
     'input.query(q_bin)=[-119]' \
     'ranking=app_ranking_bin'
 ```
@@ -440,7 +440,7 @@ Notes:
 Note the differences when using full values in the query tensor, see the relevance score for the results:
 ```
 $ vespa query \
-    'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
+    'yql=select * from music where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
     'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \
     'input.query(q_bin)=[-119]' \
     'ranking=app_ranking_bin_full'
@@ -452,7 +452,7 @@ $ vespa query \
 
 ```
 $ vespa query \
-    'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
+    'yql=select * from music where {totalTargetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \
     'input.query(q)=[2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \
     'input.query(q_bin)=[-119]' \
     'ranking=app_ranking_bin_full'
diff --git a/en/rag/embedding.html b/en/rag/embedding.html
index 884b9839f6..06cbaca9b0 100644
--- a/en/rag/embedding.html
+++ b/en/rag/embedding.html
@@ -76,7 +76,7 @@ <h2 id="embedding-a-query-text">Embedding a query text</h2>
 <p>The text argument can be supplied by a referenced parameter instead, using the <code>@parameter</code> syntax:</p>
 <pre>{% highlight json %}
 {
-    "yql": "select * from doc where {targetHits:10}nearestNeighbor(embedding_field, query_embedding)",
+    "yql": "select * from doc where {totalTargetHits:10}nearestNeighbor(embedding_field, query_embedding)",
     "text": "my text to embed",
     "input.query(query_embedding)": "embed(@text)",
 }
@@ -761,7 +761,7 @@ <h3 id="adding-a-fixed-string-to-a-query-text">Adding a fixed string to a query
 the <code>text</code> value which then is embedded.</p>
 <pre>{% highlight json %}
 {
-    "yql": "select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))",
+    "yql": "select * from doc where userQuery() or (totalTtargetHits: 100}nearestNeighbor(embedding, e))",
     "input.query(e)": "embed(mxbai, @text)",
     "user_query": "space contains many suns"
 }
diff --git a/en/rag/working-with-chunks.html b/en/rag/working-with-chunks.html
index 07ad37c463..5956e2f832 100644
--- a/en/rag/working-with-chunks.html
+++ b/en/rag/working-with-chunks.html
@@ -166,7 +166,7 @@ <h2 id="searching-chunks">Searching chunks</h2>
 <p>A simple hybrid query can look like this:</p>
 
 <pre>
-yql=select * from doc where userInput(@query) or ({targetHits:10}nearestNeighbor(myEmbeddings, e))
+yql=select * from doc where userInput(@query) or ({totalTargetHits:10}nearestNeighbor(myEmbeddings, e))
 input.query(e)=embed(@query)
 query=Do Cholesterol Statin Drugs Cause Breast Cancer?
 </pre>
diff --git a/en/ranking/ranking-intro.md b/en/ranking/ranking-intro.md
index 2d1da716eb..da9b394dee 100644
--- a/en/ranking/ranking-intro.md
+++ b/en/ranking/ranking-intro.md
@@ -365,7 +365,7 @@ As the point of [weakAnd](../reference/querying/yql.html#weakand) is to early di
 _totalCount_ is an approximation:
 
 <a class="querystring-x">yql=select * from doc where
-{scoreThreshold: 0, targetHits: 10}weakAnd(
+{scoreThreshold: 0, totalTargetHits: 10}weakAnd(
 default contains "vespa",
 default contains "documents",
 default contains "about",
diff --git a/en/ranking/wand.html b/en/ranking/wand.html
index ca54ad27cf..b0f04289c3 100644
--- a/en/ranking/wand.html
+++ b/en/ranking/wand.html
@@ -41,7 +41,7 @@
   The WAND algorithm tries to address this problem by starting the search for candidate documents using OR,
   limiting the number of documents that are ranked, saving both latency and resource usage (cost)
   while still returning the same or almost the same top-k results as the brute force OR.
-  For the example, using WAND with <em>K</em> or <em>targetHits</em> to 1000, only 196,900 documents are fully ranked.
+  For the example, using WAND with <em>K</em> or <em>totalTargetHits</em> to 1000, only 196,900 documents are fully ranked.
   That is a huge improvement over the exhaustive OR search which retrieves and ranks <em>7,926,256</em> documents
   and at the same time retrieving the same results as the exhaustive OR search.
 </p>
@@ -109,7 +109,7 @@ <h2 id="weakand">weakAnd</h2>
   specify the target for minimum number of hits the operator should produce per content node involved in the query.
 </p>
 <p>
-  The effect of tuning <code>targetHits</code> may not be intuitive.
+  The effect of tuning <code>totalTargetHits</code> may not be intuitive.
   To ensure that you get the best hits possible with a weakAnd,
   set the target number somewhat higher than the number of hits returned to the user;
   setting it 10 times higher should be more than enough.
@@ -156,7 +156,7 @@ <h2 id="weakand">weakAnd</h2>
 </p>
 <pre>
 select * from passages where (
-    {targetHits: 200}
+    {totalTargetHits: 200}
         weakAnd(
             default contains "is", default contains "cdg", default contains "airport",
             default contains "in", default contains "main", default contains "paris"
@@ -277,7 +277,7 @@ <h2 id="wand">wand</h2>
 <pre>
 {
     "yql":"select * from passages where rank(
-        ({targetHits: 25}
+        ({totalTargetHits: 25}
             wand(deep_ct_tokens, @tokens)),
             userQuery())",
     "tokens": "{2003: 1, 3729: 1, 2290: 1, 3199: 1, 1999: 1, 2364: 1, 3000: 1}",
diff --git a/en/reference/api/query.html b/en/reference/api/query.html
index 70c3e5b0f5..f9ecda7bb9 100644
--- a/en/reference/api/query.html
+++ b/en/reference/api/query.html
@@ -1149,7 +1149,7 @@ <h2 id="ranking.matching">ranking.matching</h2>
       </p>
       <p>
         Value used to control the auto-adjustment of
-        <a href="../querying/yql.html#targethits">targetHits</a> used when evaluating an approximate
+        <a href="../querying/yql.html#totaltargethits">totalTargetHits</a> used when evaluating an approximate
         <a href="../querying/yql.html#nearestneighbor">nearestNeighbor</a>
         operator with post-filtering.
       </p>
diff --git a/en/reference/querying/json-query-language.md b/en/reference/querying/json-query-language.md
index b19fc748f2..65fdfc0f81 100644
--- a/en/reference/querying/json-query-language.md
+++ b/en/reference/querying/json-query-language.md
@@ -494,7 +494,7 @@ Format of this in JSON:
 
 Another example:
 
-YQL: `where [ {"scoreThreshold": 13, "targetHits": 7} ]wand(description, {"a":1, "b":2})`.
+YQL: `where [ {"scoreThreshold": 13, "totalTargetHits": 7} ]wand(description, {"a":1, "b":2})`.
 
 Format of this in JSON:
 
@@ -502,7 +502,7 @@ Format of this in JSON:
 "where" : {
   "wand" : {
     "children" : [ "description", {"a" : 1, "b":2} ],
-    "attributes" : {"scoreThreshold": 13, "targetHits": 7}
+    "attributes" : {"scoreThreshold": 13, "totalTargetHits": 7}
   }
 }
 ```
@@ -530,7 +530,7 @@ Format of this in JSON:
 ```
 
 ###### weakAnd
-YQL: `where {scoreThreshold: 41, "targetHits": 7}weakAnd(a contains "A", b contains "B")`.
+YQL: `where {scoreThreshold: 41, "totalTargetHits": 7}weakAnd(a contains "A", b contains "B")`.
 
 Format of this in JSON:
 
@@ -538,7 +538,7 @@ Format of this in JSON:
 "where" : {
   "weakAnd" : {
     "children" : [ { "contains" : ["a", "A"] }, { "contains" : ["b", "B"] } ],
-    "attributes" : {"scoreThreshold": 41, "targetHits": 7}
+    "attributes" : {"scoreThreshold": 41, "totalTargetHits": 7}
 	}
 }
 ```
diff --git a/en/reference/querying/yql.html b/en/reference/querying/yql.html
index ef96cbbff3..0bcf105fa1 100644
--- a/en/reference/querying/yql.html
+++ b/en/reference/querying/yql.html
@@ -599,6 +599,7 @@ <h2 id="where">where</h2>
             only the following annotations are applied:</p>
             <ul>
               <li><a href="#defaultindex">defaultIndex</a></li>
+              <li><a href="#totaltargethits">totalTargetHits</a></li> (for weakAnd)
               <li><a href="#targethits">targetHits</a></li> (for weakAnd)
               <li><a href="#distance">distance</a></li> (for near/oNear)
               <li><a href="#ranked">ranked</a></li>
@@ -963,14 +964,20 @@ <h2 id="where">where</h2>
         <td><a href="#scorethreshold">scoreThreshold</a></td>
         <td>Minimum rank score for hits to include.</td>
       </tr>
+      <tr>
+        <td><a href="#totaltargethits">totalTargetHits</a></td>
+        <td>Wanted number of hits exposed to the first-phase ranking function in total over the content nodes
+          evaluating the query.</td>
+      </tr>
       <tr>
         <td><a href="#targethits">targetHits</a></td>
-        <td>Wanted number of hits exposed to the real first-phase ranking function per content node.</td>
+        <td>Wanted number of hits exposed to the first-phase ranking function per content node.
+        Prefer using <a href="#totaltargethits">totalTargetHits</a> over this.</td>
       </tr>
       </tbody>
     </table>
 <pre>
-where ({scoreThreshold: 0.13, targetHits: 7}wand(description, {"a":1, "b":2}))
+where ({scoreThreshold: 0.13, totalTargetHits: 7}wand(description, {"a":1, "b":2}))
 </pre>
     <p>
       Refer to <a href="../../ranking/wand.html">using wand</a> for introduction to the WAND
@@ -1039,14 +1046,19 @@ <h2 id="where">where</h2>
         </tr>
         </thead>
         <tbody>
+        <tr>
+          <td><a href="#totaltargethits">totalTargetHits</a></td>
+          <td>Wanted number of hits exposed to the first-phase ranking function in total over the content nodes evaluating the query.</td>
+        </tr>
         <tr>
           <td><a href="#targethits">targetHits</a></td>
-          <td>Wanted number of hits exposed to the real first-phase ranking function per content node.</td>
+          <td>Wanted number of hits exposed to the first-phase ranking function per content node.
+          Prefer using <a href="#totaltargethits">totalTargetHits</a> over this.</td>
         </tr>
         </tbody>
       </table>
 <pre>
-where ({targetHits: 7}weakAnd(a contains "A", b contains "B"))
+where ({totaltargetHits: 7}weakAnd(a contains "A", b contains "B"))
 </pre>
       <p>
         Unlike <a href="#wand">wand</a>, <code>weakAnd</code> can be used
@@ -1280,12 +1292,12 @@ <h2 id="where">where</h2>
         the <span style="text-decoration: underline">approximate</span> nearest neighbors are returned. Example:
       </p>
 <pre>
-where ({targetHits: 10}nearestNeighbor(doc_vector, query_vector))&amp;input.query(query_vector)=[3,5,7]&ranking=semantic
+where ({totaltargetHits: 10}nearestNeighbor(doc_vector, query_vector))&amp;input.query(query_vector)=[3,5,7]&ranking=semantic
 </pre>
       <p>
         In this example we search for the top 10 nearest neighbors in a 3-dimensional vector space.
-        <em>targetHits</em> specifies the top-k nearest neighbors to expose to a user defined <code>semantic</code>
-        <a href="../../basics/ranking.html">rank profile</a>. The <a href="#targethits">targetHits</a> annotation is required.
+        <em>totalTargetHits</em> specifies the top-k nearest neighbors to expose to a user defined <code>semantic</code>
+        <a href="../../basics/ranking.html">rank profile</a>. The <a href="#totaltargethits">totalTargetHits</a> annotation is required.
         The first parameter of <em>nearestNeighbor</em> is the name of the tensor field attribute
         containing the document vectors (<em>doc_vector</em>).
       </p>
@@ -1331,13 +1343,20 @@ <h2 id="where">where</h2>
         </tr>
         </thead>
         <tbody>
+        <tr>
+          <td><a href="#totaltargethits">totalTargetHits</a></td>
+          <td>
+            Specifies the number of hits nearestNeighbor
+            should expose to <a href="../../basics/ranking.html">ranking</a> in total over the content
+            nodes evaluating the query. Note that more or less hits may actually be produced.
+            Setting target hits is required.
+          </td>
+        </tr>
         <tr>
           <td><a href="#targethits">targetHits</a></td>
           <td>
-            This annotation is required, and specifies the number of hits nearestNeighbor
-            should expose to <a href="../../basics/ranking.html">ranking</a>.
-            Note that more or less hits might actually be produced. <em>targetHits</em> is per node
-            involved in the query.
+            Specifies the target hits <i>per node</i>.
+            Prefer using <a href="#totaltargethits">totalTargetHits</a> over this.
           </td>
         </tr>
         <tr>
@@ -1356,7 +1375,7 @@ <h2 id="where">where</h2>
         <tr>
           <td><a href="#hnsw-exploreadditionalhits">hnsw.exploreAdditionalHits</a></td>
           <td>
-            Tune how many extra nodes in the HNSW graph (in addition to <code>targetHits</code>)
+            Tune how many extra nodes in the HNSW graph (in addition to <code>totalTargetHits</code>)
             that should be explored before selecting the best hits. Default is <code>0</code>. Increasing
             this parameter increases the accuracy of the approximate search, at the cost of more distance computations.
           </td>
@@ -1818,7 +1837,7 @@ <h2 id="annotations">Annotations</h2>
       The <code>distanceThreshold</code> annotation may be used to filter away hits
       with a higher distance than the given threshold from the results.
       Note that one will never get more hits with <code>distanceThreshold</code> than you would get without it -
-      to get more hits, increase <a href="#targethits">targetHits</a>, too.
+      to get more hits, increase <a href="#totaltargethits">totalTargetHits</a>, too.
       The units for the threshold depends on the
       <a href="../schemas/schemas.html#distance-metric">distance metric</a> used.
       </p>
@@ -1991,7 +2010,7 @@ <h2 id="annotations">Annotations</h2>
       Used in <a href="#nearestneighbor">nearestNeighbor</a>.
       When using an <a href="../schemas/schemas.html#index-hnsw">HNSW index</a>,
       the optional <code>hnsw.exploreAdditionalHits</code> annotation can be used to
-      tune how many extra nodes in the graph (in addition to <code>targetHits</code>)
+      tune how many extra nodes in the graph (in addition to <code>totalTargetHits</code>)
       should be explored before selecting the best hits.
       Using a greater number here gives better quality, but worse performance.
       </p>
@@ -2182,23 +2201,32 @@ <h2 id="annotations">Annotations</h2>
     <td><p id="suffix">Do <em>suffix matching</em> for this term, e.g. search for "*word".</p></td>
   </tr>
   <tr>
-    <td>targetHits</td>
+    <td>totalTargetHits</td>
     <td>100</td>
     <td>int</td>
     <td>
-      <p id="targethits">
-        Used by <a href="#wand">wand</a> and <a href="#weakand">weakAnd</a>, where the default is 100.
+      <p id="totaltargethits">
+        Used by <a href="#wand">wand</a> and <a href="#weakand">weakAnd</a>, where the default is 100,
+        and with <a href="#nearestneighbor">nearestNeighbor</a>,
+        where it has no default.
+        This sets the wanted number of hits exposed to the first-phase ranking function in total
+        over the content nodes evaluating the query (a <i>group</i>).
+        If additional second phase ranking is used,
+        do not set <code>totalTargetHits</code> less than the configured rank-profile's
+        <a href="../schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>.
       </p>
       <p>
-        It is also used with <a href="#nearestneighbor">nearestNeighbor</a>,
-        where it has no default - it must always be set,
-        see examples in <a href="../../querying/nearest-neighbor-search">nearest neighbor search</a>.
+        See examples in <a href="../../querying/nearest-neighbor-search">nearest neighbor search</a>.
       </p>
-      <p>
-        It sets the wanted number of hits exposed to the real first-phase ranking function per content node.
-        If additional second phase ranking is used,
-        do not set <code>targetHits</code> less than the configured rank-profile's
-        <a href="../schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>.
+    </td>
+  </tr>
+  <tr>
+    <td>targetHits</td>
+    <td>100</td>
+    <td>int</td>
+    <td>
+      <p id="targethits">
+      Sets target hots per node. Prefer using <a href="#totaltargethits">totalTargetHits</a> over this.
       </p>
     </td>
   </tr>
diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html
index 8d5e044cc5..ff7abc29ce 100644
--- a/en/reference/schemas/schemas.html
+++ b/en/reference/schemas/schemas.html
@@ -1609,9 +1609,9 @@ <h2 id="rank-profile">rank-profile</h2>
     Controlling the filtering behavior with approximate nearest neighbor search</a> for more details.
   </p>
   <p>
-    With post-filtering the <a href="../querying/yql.html#targethits">targetHits</a> value
-    used when searching the HNSW index is auto-adjusted in an effort to expose <em>targetHits</em> hits
-    to first-phase ranking after post-filtering has been applied. The following formula is used:
+    With post-filtering the <a href="../querying/yql.html#totaltargethits">totalTargetHits</a> value
+    used when searching the HNSW index is auto-adjusted in an effort to expose the node's shgare of <em>totalTargetHits</em>
+    hits to first-phase ranking after post-filtering has been applied. The following formula is used:
   </p>
 <pre>
     adjustedTargetHits = min(targetHits / estimatedFilterHitRatio, targetHits * targetHitsMaxAdjustmentFactor).
@@ -1705,7 +1705,7 @@ <h2 id="rank-profile">rank-profile</h2>
 <td>
   <p id="target-hits-max-adjustment-factor">
     Value (in the range [1.0, inf]) used to control the auto-adjustment of
-    <a href="../querying/yql.html#targethits">targetHits</a> used when evaluating an approximate
+    <a href="../querying/yql.html#totaltargethits">totalTargetHits</a> used when evaluating an approximate
     <a href="../querying/yql.html#nearestneighbor">nearestNeighbor</a>
     operator with post-filtering.
     The default value is 20.0.

From 447019c9a01925899876624ac8d856a077cd13c9 Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@vespa.ai>
Date: Mon, 16 Mar 2026 18:00:34 +0100
Subject: [PATCH 5/9] Update en/learn/faq.md

Co-authored-by: Kristian Aune <kkraune@users.noreply.github.com>
---
 en/learn/faq.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/en/learn/faq.md b/en/learn/faq.md
index 9ede0766ef..762fa5c011 100644
--- a/en/learn/faq.md
+++ b/en/learn/faq.md
@@ -87,7 +87,7 @@ of a double. This can happen in two cases:
 
 - The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce
 -Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this.
-- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) perameter.
+- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) parameter.
 
 ### How to pin query results?
 To hard-code documents to positions in the result set,

From 82789856ca1c84af77cab4449e3ae237e4a8472a Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@vespa.ai>
Date: Mon, 16 Mar 2026 18:03:16 +0100
Subject: [PATCH 6/9] Update en/reference/querying/yql.html

Co-authored-by: Kristian Aune <kkraune@users.noreply.github.com>
---
 en/reference/querying/yql.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/en/reference/querying/yql.html b/en/reference/querying/yql.html
index 0bcf105fa1..5b085edb67 100644
--- a/en/reference/querying/yql.html
+++ b/en/reference/querying/yql.html
@@ -2226,7 +2226,7 @@ <h2 id="annotations">Annotations</h2>
     <td>int</td>
     <td>
       <p id="targethits">
-      Sets target hots per node. Prefer using <a href="#totaltargethits">totalTargetHits</a> over this.
+      Sets target hits per node. Prefer using <a href="#totaltargethits">totalTargetHits</a> over this.
       </p>
     </td>
   </tr>

From 95f294294a57e432d1a422e24d36e1752e42e5d8 Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@vespa.ai>
Date: Mon, 16 Mar 2026 18:03:34 +0100
Subject: [PATCH 7/9] Update en/reference/schemas/schemas.html

Co-authored-by: Kristian Aune <kkraune@users.noreply.github.com>
---
 en/reference/schemas/schemas.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html
index ff7abc29ce..517cef0b7b 100644
--- a/en/reference/schemas/schemas.html
+++ b/en/reference/schemas/schemas.html
@@ -1610,7 +1610,7 @@ <h2 id="rank-profile">rank-profile</h2>
   </p>
   <p>
     With post-filtering the <a href="../querying/yql.html#totaltargethits">totalTargetHits</a> value
-    used when searching the HNSW index is auto-adjusted in an effort to expose the node's shgare of <em>totalTargetHits</em>
+    used when searching the HNSW index is auto-adjusted in an effort to expose the node's share of <em>totalTargetHits</em>
     hits to first-phase ranking after post-filtering has been applied. The following formula is used:
   </p>
 <pre>

From 05caca7ba0595339ccd6043bb1bc32e66046c7ad Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@gmail.com>
Date: Mon, 16 Mar 2026 18:02:53 +0100
Subject: [PATCH 8/9] Correct sentence

---
 en/learn/tutorials/rag-blueprint.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/en/learn/tutorials/rag-blueprint.md b/en/learn/tutorials/rag-blueprint.md
index 7750c5e668..058df3a752 100644
--- a/en/learn/tutorials/rag-blueprint.md
+++ b/en/learn/tutorials/rag-blueprint.md
@@ -570,8 +570,8 @@ not the case for most real-world RAG applications, so this is cruical to have in
 
 ![phased ranking overview](/assets/img/phased-ranking-rag.png)
 
-That the stateless container nodes can 
-also be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
+The stateless container nodes can 
+be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
 
 ## Configuring match-phase (retrieval)
 

From 59eccc08ff7aa9c7ec51e9a60289223eef23643f Mon Sep 17 00:00:00 2001
From: Jon Bratseth <bratseth@vespa.ai>
Date: Mon, 16 Mar 2026 18:04:26 +0100
Subject: [PATCH 9/9] Update en/reference/schemas/schemas.html

Co-authored-by: Kristian Aune <kkraune@users.noreply.github.com>
---
 en/reference/schemas/schemas.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/en/reference/schemas/schemas.html b/en/reference/schemas/schemas.html
index 517cef0b7b..e32c47915d 100644
--- a/en/reference/schemas/schemas.html
+++ b/en/reference/schemas/schemas.html
@@ -1874,7 +1874,7 @@ <h2 id="diversity">diversity</h2>
 unique values from the <a href="#diversity-min-groups">diversity attribute</a> from this phase,
 but no more than max-hits.
 For <a href="#match-phase">match-phase</a> max-hits = the node's share of <a href="#match-phase-max-hits">match-phase total-max-hits</a>.
-For <a href="#secondphase-rank">second-phase</a> max-hits = the node's share of <a href="#secondphase-total-rerank-count">total-rerank-count</a>
+For <a href="#secondphase-rank">second-phase</a> max-hits = the node's share of <a href="#secondphase-total-rerank-count">total-rerank-count</a>.
 A document is considered a candidate if:
 <ul>
   <li>The query has not yet reached the <em>max-hits</em>