Skip to content

Commit 78b985d

Browse files
authored
Merge pull request #4581 from vespa-engine/bratseth/total-parameters
Bratseth/total parameters
2 parents 66427ce + 59eccc0 commit 78b985d

21 files changed

Lines changed: 264 additions & 158 deletions

en/basics/ranking.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,12 +96,12 @@ <h2 id="phased-ranking">Phased ranking</h2>
9696

9797
second-phase {
9898
expression: xgboost(my_xgboost_reranker)
99-
rerank-count: 1000 # per content node
99+
total-rerank-count: 1000 # Over all nodes
100100
}
101101

102102
global-phase {
103103
expression: sum(onnx(my_large_onnx_model))
104-
rerank-count: 20 # globally
104+
rerank-count: 20
105105
}
106106

107107
}

en/clients/vespa-cli.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ <h3 id="queries">Queries</h3>
233233
<p>Example query file:</p>
234234
<pre>{% highlight json %}
235235
{
236-
"yql": "select product_id, title from products where {targetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
236+
"yql": "select product_id, title from products where {totalTargetHits: 200}nearestNeighbor(dense_embedding, q_vector)",
237237
"input.query(q_vector)": [-0.050548091530799866, ... ,0.028366032987833023],
238238
"ranking": "vector_distance"
239239
}

en/content/attributes.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -586,7 +586,7 @@ <h2 id="paged-attributes">Paged attributes</h2>
586586
where the number of attribute accesses are limited by the re-ranking phase count.
587587
</p>
588588
<p>
589-
For example using a second phase <a href="../reference/schemas/schemas.html#secondphase-rerank-count">rerank-count</a>
589+
For example using a second phase <a href="../reference/schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>
590590
of 100 will limit the maximum number of page-ins/disk access per query to 100.
591591
Running at 100 QPS would need up to 10K disk accesses per second.
592592
This is the worst case if none of the accessed attribute data were paged into memory already.
@@ -608,7 +608,7 @@ <h2 id="paged-attributes">Paged attributes</h2>
608608
rank-profile foo {
609609
first-phase {}
610610
second-phase {
611-
rerank-count: 100
611+
total-rerank-count: 100
612612
expression: sum(attribute(tensordata))
613613
}
614614
}

en/learn/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ of a double. This can happen in two cases:
8787

8888
- The [ranking](../basics/ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce
8989
-Infinity. One can use [isNan](../reference/ranking/ranking-expressions.html#isnan-x) to guard against this.
90-
- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside of what Vespa computed and caches on a heap. This is controlled by the [keep-rank-count](../reference/schemas/schemas.html#keep-rank-count).
90+
- Surfacing low scoring hits using [grouping](../querying/grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside what Vespa computed and caches on a heap. This is controlled by the [total-keep-rank-count](../reference/schemas/schemas.html#total-keep-rank-count) parameter.
9191

9292
### How to pin query results?
9393
To hard-code documents to positions in the result set,

en/learn/tutorials/rag-blueprint.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -570,9 +570,8 @@ not the case for most real-world RAG applications, so this is cruical to have in
570570

571571
![phased ranking overview](/assets/img/phased-ranking-rag.png)
572572

573-
It is worth noting that parameters such as `targetHits` (for the match phase) and `rerank-count`
574-
(for first and second phase) are applied **per content node**. Also note that the stateless container nodes can
575-
also be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
573+
The stateless container nodes can
574+
be [scaled independently](../../performance/sizing-search.html) to handle increased query load.
576575

577576
## Configuring match-phase (retrieval)
578577

@@ -1380,8 +1379,8 @@ We run the evaluation script on a set of unseen test queries, and get the follow
13801379
```
13811380

13821381
For the first phase ranking, we care most about recall, as we just want to make sure that the candidate documents are
1383-
ranked high enough to be included in the second-phase ranking. (the default number of documents that will be exposed to
1384-
second-phase is 10 000, but can be controlled by the `rerank-count` parameter).
1382+
ranked high enough to be included in the second-phase ranking. The number of documents to be reranked in second-phase
1383+
in total over all content nodes is controlled by the `total-rerank-count` parameter.
13851384

13861385
We can see that our results are already very good. This is of course due to the fact that we have a small,synthetic dataset.
13871386
In reality, you should align the metric expectations with your dataset and test queries.
@@ -1392,7 +1391,7 @@ within your latency budget, as you want some headroom for second-phase ranking.
13921391
## Second-phase ranking
13931392

13941393
For the second-phase ranking, we can afford to use a more expensive ranking expression, since we will only run it
1395-
on the top-k documents from the first-phase ranking (defined by the `rerank-count` parameter, which defaults to 10,000 documents).
1394+
on the top-k documents from the first-phase ranking (decided by the `total-rerank-count` parameter).
13961395

13971396
This is where we can significantly improve ranking quality by using more sophisticated models and features that would
13981397
be too expensive to compute for all matched documents.
@@ -1589,7 +1588,7 @@ vespa query \
15891588
**Performance monitoring:**
15901589

15911590
* Monitor latency impact of second-phase ranking
1592-
* Adjust `rerank-count` based on quality vs. performance trade-offs
1591+
* Adjust `total-rerank-count` based on quality vs. performance trade-offs
15931592
* Consider using different models for different query types or use cases
15941593

15951594
The second-phase ranking represents a crucial step in building high-quality RAG applications,
@@ -1598,7 +1597,7 @@ providing the precision needed for effective LLM context while maintaining reaso
15981597
## (Optional) Global-phase ranking
15991598

16001599
We also have the option of configuring [global-phase](../../reference/schemas/schemas.html#globalphase-rank) ranking, which can rerank the top k
1601-
(as set by `rerank-count` parameter) documents from the second-phase ranking.
1600+
(as set by `total-rerank-count` parameter) documents from the second-phase ranking.
16021601

16031602
Common options for global-phase are [cross-encoders](../../ranking/cross-encoders.html) or another GBDT model, trained for
16041603
better separating top ranked documents on objectives such as [LambdaMart](https://xgboost.readthedocs.io/en/latest/tutorials/learning_to_rank.html). For RAG applications,

en/performance/graceful-degradation.html

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -177,21 +177,21 @@ <h2 id="match-phase-degradation">Match phase degradation</h2>
177177
<p>
178178
Match-phase works by specifying an <code>attribute</code> that measures document
179179
quality in some way (popularity, click-through rate, pagerank, ad bid value, price, text quality).
180-
In addition, a <code>max-hits</code> value is specified
181-
that specifies how many hits are "more than enough" for the application.
180+
In addition, a <code>total.max-hits</code> value is specified
181+
that specifies how many hits in total over the content nodes are "more than enough" for the application.
182182
Then an estimate is made after collecting a reasonable amount of hits for the query,
183-
and if the estimate is higher than the configured <code>max-hits</code> value,
183+
and if the estimate is higher than the node's share of the <code>total-max-hits</code> value,
184184
an extra limitation is added to the query,
185185
ensuring that only the highest quality documents can become hits.
186186
</p><p>
187187
In effect, this limits the documents actually queried to the highest quality documents,
188188
a subset of the full corpus,
189189
where the size of subset is calculated in such a way
190-
that the query is estimated to give <code>max-hits</code> hits.
190+
that the query is estimated to give the node's share of <code>total-max-hits</code> hits.
191191
Since some (low-quality) hits will already have been collected to do the estimation,
192-
the actual number of hits returned will usually be higher than max-hits.
192+
the actual number of hits returned will usually be higher than total-max-hits.
193193
But since the distribution of documents isn't perfectly smooth,
194-
you risk sometimes getting less than the configured <code>max-hits</code> hits back.
194+
you risk sometimes getting less than the configured <code>total-max-hits</code> hits back.
195195
</p><p>
196196
Note that limiting hits in the match-phase also affects <a href="../querying/grouping.html">aggregation/grouping</a>,
197197
and total-hit-count since it actually limits, so the query gets fewer hits.
@@ -200,7 +200,7 @@ <h2 id="match-phase-degradation">Match phase degradation</h2>
200200
since they both operate in the same manner,
201201
and you would get interference between them that could cause unpredictable results.
202202
The graph shows possible hits versus actual hits in a corpus with 100 000 documents,
203-
where <code>max-hits</code> is configured to 10 000.
203+
where <code>total-max-hits</code> is configured to 10 000 per node.
204204
The corpus is a synthetic (slightly randomized) data set,
205205
in practice the graph will be less smooth:
206206
</p>

en/performance/practical-search-performance-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1122,7 +1122,7 @@ Repeating the query from above, replacing `dotProduct` with `wand`:
11221122
<button class="d-icon d-duplicate pre-copy-button" onclick="copyPreContent(this)"></button>
11231123
<pre data-test="exec" data-test-assert-contains="Vastarannan valssi">
11241124
$ vespa query \
1125-
'yql=select track_id, title, artist, tags from track where {targetHits:10}wand(tags, @userProfile)' \
1125+
'yql=select track_id, title, artist, tags from track where {totalTargetHits:10}wand(tags, @userProfile)' \
11261126
'userProfile={"hard rock":1, "rock":1,"metal":1, "finnish metal":1}' \
11271127
'hits=1' \
11281128
'ranking=personalized'

en/querying/approximate-nn-hnsw.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ or exact (brute-force) search by using the [approximate query annotation](../ref
134134

135135
<pre>
136136
{
137-
"yql": "select * from doc where {targetHits: 100, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
137+
"yql": "select * from doc where {totalTargetHits: 10, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)",
138138
"hits": 10
139139
"input.query(query_image_embedding)": [0.21,0.12,....],
140140
"ranking.profile": "image_similarity"
@@ -150,9 +150,9 @@ Note that exact searches over a large vector volume require adjustment of the
150150
The default [query timeout](../reference/api/query.html#timeout) is 500ms,
151151
which will be too low for an exact search over many vectors.
152152

153-
In addition to [targetHits](../reference/querying/yql.html#targethits),
153+
In addition to [totalTargetHits](../reference/querying/yql.html#totaltargethits),
154154
there is a [hnsw.exploreAdditionalHits](../reference/querying/yql.html#hnsw-exploreadditionalhits) parameter
155-
which controls how many extra nodes in the graph (in addition to `targetHits`)
155+
which controls how many extra nodes in the graph (in addition to `totalTargetHits`)
156156
that are explored during the graph search. This parameter is used to tune accuracy quality versus query performance.
157157

158158
## Combining approximate nearest neighbor search with filters
@@ -174,22 +174,23 @@ Note that when using `pre-filtering` the following query operators are not inclu
174174
* [predicate](../reference/querying/yql.html#predicate)
175175

176176
These are instead evaluated after the approximate nearest neighbors are retrieved, more like a `post-filter`.
177-
This might cause the search to expose fewer hits to ranking than the wanted `targetHits`.
177+
This might cause the search to expose fewer hits to ranking than the wanted `totalTargetHits`.
178178

179179
Since {% include version.html version="8.78" %} the `pre-filter` can be evaluated using
180180
[multiple threads per query](../performance/practical-search-performance-guide.html#multithreaded-search-and-ranking).
181181
This can be used to reduce query latency for larger vector datasets where the cost of evaluating the `pre-filter` is significant.
182182
Note that searching the `HNSW` index is always single-threaded per query.
183183
Multithreaded evaluation when using `post-filtering` has always been supported,
184-
but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `targetHits`.
184+
but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `totalTargetHits`.
185185

186186
## Nearest Neighbor Search Considerations
187187

188-
* **targetHits**:
189-
The [targetHits](../reference/querying/yql.html#targethits)
190-
specifies how many hits one wants to expose to [ranking](../basics/ranking.html) *per content node*.
191-
Approximate search exposes exactly `targetHits` hits to `first-phase` ranking on every content node
192-
as long as `targetHits` hits are actually found and not filtered out afterwards.
188+
* **totalTargetHits**:
189+
The [totalTargetHits](../reference/querying/yql.html#totaltargethits) parameter
190+
specifies how many hits one wants to expose to [ranking](../basics/ranking.html) in total over the content nodes
191+
participating in the query (you can also set this per node using [targetHits](../reference/querying/yql.html#targethits)).
192+
Approximate search exposes exactly `totalTargetHits` hits to `first-phase` ranking over the content nodes
193+
as long as `totalTargetHits` hits are actually found and not filtered out.
193194
Nearest neighbor search is typically used as an efficient retriever in a [phased ranking](../ranking/phased-ranking.html)
194195
pipeline. See [performance sizing](../performance/sizing-search.html).
195196

0 commit comments

Comments
 (0)