-
Notifications
You must be signed in to change notification settings - Fork 108
Expand file tree
/
Copy pathphased-ranking.html
More file actions
541 lines (495 loc) · 21.9 KB
/
phased-ranking.html
File metadata and controls
541 lines (495 loc) · 21.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
---
# Copyright Vespa.ai. All rights reserved.
title: "Phased Ranking"
redirect_from:
- /en/phased-ranking.html
---
<p>
Vespa allows expressing multiphased retrieval and ranking of documents. The retrieval phase is done close to the data in the content nodes,
while the ranking phase(s) can be done in the content nodes or in the stateless container after scatter-and-gather from content node(s).
The retrieval part of a phased pipeline is expressed by using the <a href="../querying/query-api.html">query API</a>
and the ranking part is expressed by using the <a href="../reference/schemas/schemas.html#rank-profile">rank-profile</a> in the schema.
</p>
<ul>
<li><strong>Retrieval:</strong><a href="#top-k-query-operators"> <em>Top-k</em> query operators</a>, like <a href="wand.html">weakAnd/wand</a> and
<a href="../querying/nearest-neighbor-search">nearest neighbor search</a> allow retrieval with sublinear complexity. These
query operators use simple scoring functions that are computationally cheap to evaluate over Vespa indexes. Using
the expressiveness of the Vespa query language, developers can combine multiple retrievers in the query, and expose
the union of retrieved documents into Vespa ranking phases.</li>
<li>
<strong>Per node ranking:</strong>The query specification retrieves documents and ranks them using declarative phases evaluated within
the <a href="#two-phase-ranking-on-content-nodes">content nodes</a>:
<ul>
<li><a href="#first-phase-ranking-on-content-nodes">first-phase expression</a>;
configured in <a href="../reference/schemas/schemas.html#rank-profile">rank-profile</a>.
This phase is evaluated for <em>all</em> hits retrieved by the query logic. This phase can also remove
retrieved documents using <a href="../reference/schemas/schemas.html#rank-score-drop-limit">rank-score-drop-limit</a>.
</li>
<li><a href="#two-phase-ranking-on-content-nodes">second-phase ranking</a>;
configured in <a href="../reference/schemas/schemas.html#rank-profile">rank-profile</a>.
Optionally re-rank the top-scoring hits from the first-phase ranking using a more complex expression. The
<a href="../reference/schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a> sets a strict upper bound on the
number of documents that are re-ranked in total over the nodes.
</li>
</ul>
<li><strong>Global ranking:</strong>Following the per content node local ranking phases,
re-ranking can occur after the content nodes have returned the merged top-scoring hits to the stateless container.
This phase is specified using a <a href="#using-a-global-phase-expression">global-phase</a> expression in the
<a href="../reference/schemas/schemas.html#rank-profile">rank-profile</a>.
Additionally, the global-phase can conduct <a href="#cross-hit-normalization-including-reciprocal-rank-fusion">
cross-hit normalization</a> to combine unrelated scoring methods.
</li>
<li>Finally, for customized ranking that is difficult to express in declarative phases, one can implement
re-ranking using <a href="reranking-in-searcher">reranking in searcher</a>.
</li>
</ul>
<img src="/assets/img/phased-ranking.png" alt="Ranking in 3 phases"/>
<h2 id="first-phase-ranking-on-content-nodes">First-phase ranking on content nodes</h2>
<p>
Normally, you will always start by having one ranking expression
that is evaluated on the content nodes. This is configured in
the <code>rank-profile</code> section of a
<a href="../reference/schemas/schemas.html#rank-profile">schema</a>
as a <code>first-phase</code> expression. This expression can
use various <a href="ranking-expressions-features.html">rank features</a>
with information extracted during the matching phase to evaluate
the relevance of each document. The first-phase expression
is computed for every document retrieved by the query. The computational cost
is bounded by the number of documents exposed to the ranking phase on each content node multiplied with
the complexity of the first-phase expression; therefore the expression
needs to be simple and cheap to allow scaling to large amounts of retrieved docs. Alternatively,
use retrieval operators that will expose only the top-k hits to the first-phase expression.
</p>
<h2 id="two-phase-ranking-on-content-nodes">Two-phase ranking on content nodes</h2>
<p>
While some use cases only require one (simple) first-phase
ranking expression, for more advanced use cases it's possible to
add a <code>second-phase</code> ranking expression in a
<a href="../reference/schemas/schemas.html#rank-profile">rank-profile</a>
in the schema.
This enables more expensive computations than would be feasible
to run as a first-phase computation, with predictable
upper bounds for the cost.
</p>
<p>
By default, second-phase ranking (if specified) is evaluated for the 100 best hits
from the first-phase ranking per content node. The number that is reranked over all nodes can be set by
<a href="../reference/schemas/schemas.html#secondphase-total-rerank-count">total-rerank-count</a>.
</p>
<pre>
schema myapp {
…
rank-profile title-freshness inherits default {
first-phase {
expression {
bm25(title) + 3*freshness(timestamp)
}
}
second-phase {
expression {
xgboost("my-model.json")
}
total-rerank-count: 50
}
}
}
</pre>
<p>
In this example, the first phase uses the text matching feature
<a href="bm25.html">bm25</a> scoped to the <em>title</em> field
plus one of the built-in document <a href="../reference/ranking/rank-features.html">rank feature</a>
named <em>freshness</em> over a <em>timestamp</em> field which stores the epoch time in second resolution.
For each content node, the top 50 hits from the first phase function is re-ranked using
a trained <a href="xgboost">xgboost</a> model.
</p>
<h2 id="using-a-global-phase-expression">Using a global-phase expression</h2>
<p>
Using a rank expression configured as a
<a href="../reference/schemas/schemas.html#globalphase-rank">global-phase</a>
in the <code>rank-profile</code> section of a schema, you can add
a ranking phase that will run in the stateless container after
gathering the best hits from the content node phases; this can be used
instead of or in addition to
<a href="#two-phase-ranking-on-content-nodes">second-phase</a> ranking.
The global-phase can also perform
<a href="#cross-hit-normalization-including-reciprocal-rank-fusion">cross-hit normalization</a>
to combine unrelated scoring methods.
</p>
<p id="globalphase-rerank-count">
By default, global-phase ranking runs on the 100 globally best hits for
a schema; this can be tuned in the rank-profile using
<a href="../reference/schemas/schemas.html#globalphase-rank">
<code>rerank-count</code>
</a>
or per-query using the
<a href="../reference/api/query.html#ranking.globalphase.rerankcount">
<code>ranking.globalPhase.rerankCount</code>
</a>
query property.
</p>
<p>
This phase is optimized for inference with <a href="onnx">ONNX</a> models, taking
some input data from the document and some from the query, and
finding a score for how well they match. A typical use case is
re-ranking using <a href="cross-encoders">cross-encoders</a>.
</p>
<p>
It's possible to specify <em>gpu-device</em> to get GPU-accelerated computation of the
model as well. You can compute or re-shape the inputs to the
ONNX model in a function if necessary, and use the output in some
further calculation to compute the final score.
</p>
<p>
If you have large and complex expressions (including <a href="xgboost">xgboost</a>, <a href="lightgbm">lightgbm</a>),
instead of an ONNX model, it's more efficient to use the highly optimized
<a href="#two-phase-ranking-on-content-nodes">second-phase</a>
computation on content nodes. This is also true for sub-expressions that require lots of vector data, moving
vector data across the network is expensive.
</p>
{% include note.html content='You can force a sub-expression
to be computed on the content nodes by making it a function and
adding it to match-features' %}
<p id="myapp-with-global-model">
By adding the feature to <a href="../reference/schemas/schemas.html#match-features">match-features</a> in the ranking profile, the
global-phase expression can re-use the function output without the complexity of transferring the data across the network
and performing inference in the stateless container (which is less optimized).
</p>
<pre>
schema myapp {
document myapp {
field per_doc_vector type tensor<float>(x[784]) {
indexing: attribute
}
…
}
…
rank-profile with-global-model inherits default {
inputs {
query(per_query_vector) tensor<float>(d0[32])
}
first-phase {
expression: bm25(title)
}
function my_expensive_function() {
expression: # some expensive computation better done on content nodes
}
function per_doc_input() {
# simple reshaping: ONNX input wants the dimension name "d0"
expression: rename(attribute(per_doc_vector), x, d0)
}
onnx-model my_ranking_model {
file: files/my_ranking_model.onnx
input "model_input_1": per_doc_input
input "model_input_2": query(per_query_vector)
output "model_output_1": out
}
global-phase {
expression {
my_expensive_function + sum(onnx(my_ranking_model).out)
}
rerank-count: 50
}
match-features {
my_expensive_function
}
}
}
</pre>
<p>
In the example above, <em>my_expensive_function</em> will be evaluated on the content nodes
for the 50 top-ranking documents from the first-phase so that the global-phase does not need to re-evaluate.
</p>
<h2 id="cross-hit-normalization-including-reciprocal-rank-fusion">Cross-hit normalization including reciprocal rank fusion</h2>
<p>
The ranking expressions configured for global-phase may perform
cross-hit normalization of input rank features or functions. This
is designed to make it easy to combine unrelated scoring methods
into one final relevance score.
The syntax looks like a special pseudo-function call:
</p>
<ul>
<li> <code>normalize_linear(<em>my_function_or_feature</em>)</code> </li>
<li> <code>reciprocal_rank(<em>my_function_or_feature</em>)</code> </li>
<li> <code>reciprocal_rank(<em>my_function_or_feature</em>, <em>k</em>)</code> </li>
<li> <code>reciprocal_rank_fusion(<em>score_1</em>, <em>score_2</em> ...)</code> </li>
</ul>
<p>
The normalization will be performed across the hits that global-phase
reranks (see <a href="#globalphase-rerank-count">configuration</a> above).
This means that first, the input (<em>my_function_or_feature</em>)
is computed or extracted from each hit that global-phase will
rerank; then the normalization step is applied; afterwards, when
computing the actual global-phase expression, the normalized output
is used.
As an example, assume some text fields with bm25 enabled, an ONNX
model (from the <a href="#myapp-with-global-model">example</a> in
the previous section), and a "popularity" numeric attribute:
</p>
<pre>
rank-profile with-normalizers inherits with-global-model {
function my_bm25_sum() {
expression: bm25(title) + bm25(abstract)
}
function my_model_out() {
expression: sum(onnx(my_ranking_model).out)
}
global-phase {
expression {
normalize_linear(my_bm25_sum) + normalize_linear(my_model_out) + normalize_linear(attribute(popularity))
}
rerank-count: 200
}
}
</pre>
<p>
The <code>normalize_linear</code> normalizer takes a single argument, which must be
a rank-feature or the name of a function. It computes the maximum and minimum
values of that input and scales linearly to the range [0, 1], basically using
the formula <code>output = (input - min) / (max - min)</code>
</p>
<p>
The <code>reciprocal_rank</code> normalizer takes one or two arguments; the first
must be a rank-feature or the name of a function, while the second (if present)
must be a numerical constant, called <code>k</code> with default value 60.0.
It sorts the input values and finds their <em>rank</em> (so the highest score gets
rank 1, next highest 2, and so on). The output from reciprocal_rank is computed
with the formula <code> output = 1.0 / (k + rank) </code>, so note that even the best
input only gets <code>1.0 / 61 = 0.016393</code> as output with the default k.
</p>
<p>
The <code>reciprocal_rank_fusion</code> pseudo-function takes at least two arguments
and expands to the sum of their <code>reciprocal_rank</code>; it's just a
convenient way to write
</p>
<pre class=code>
reciprocal_rank(a) + reciprocal_rank(b) + reciprocal_rank(c)
</pre>
<p>as</p>
<pre class=code>
reciprocal_rank_fusion(a,b,c)
</pre>
<p>for example.</p>
<p>
See the <a href="https://github.com/vespa-engine/sample-apps/tree/master/colbert">Simple Hybrid Search with ColBERT</a>
sample application for a practical example of using reciprocal rank fusion.
</p>
<h2 id="stateless-re-ranking">Stateless re-ranking</h2>
<p>
If the logic required is not suited for the <a href="#using-a-global-phase-expression">global-phase</a> above,
it's possible to write <a href="../applications/searchers.html">stateless searchers</a>
which can re-rank hits using any custom scoring function or model.
The searcher can also blend and re-order hits from multiple sources
when using <a href="../querying/federation.html">federation</a> of content sources.
</p>
<p>
The searcher might request rank features calculated by the content nodes to be returned
along with the hit fields using <a href="../applications/inspecting-structured-data.html">summary-features</a>.
The features returned can be configured in the <em>rank-profile</em>
as <a href="../reference/schemas/schemas.html#summary-features">summary-features</a>.
</p><p>
The number of <em>hits</em> is limited by the query api
<a href="../reference/api/query.html#hits">hits</a> parameter and
<a href="../reference/api/query.html#queryprofile">maxHits</a> setting.
The hits available for container-level re-ranking are the global top-ranking hits
after content nodes have retrieved and ranked the hits,
and global top-ranking hits have been found by merging the responses from the content nodes.
</p>
<h2 id="top-k-query-operators">Top-K Query Operators</h2>
<p>
If the first-phase ranking function can be approximated as a simple linear function,
and the query mode is <em>weakAnd</em>,
the <a href="wand.html">Weak And/WAND</a> implementations in Vespa
allows avoiding fully evaluating all the documents matching the query with the <em>first-phase</em> function.
Instead, only the top-K hits using the internal wand scoring are exposed
to the <em>first-phase</em> ranking expression.
</p>
<p>
The <a href="../querying/nearest-neighbor-search">nearest neighbor search</a> operator is also a top-k
retrieval operator, and the two operators can be combined in the same query.
</p>
<h2 id="choosing-phased-ranking-functions">Choosing phased ranking functions</h2>
<p>
A good quality ranking expression will for most applications consume too much CPU
to be runnable on all retrieved or matched documents within the latency budget/SLA.
The application ranking function should hence in most cases be a second-phase function.
The task then becomes to find a first-phase function,
which correlates sufficiently well with the second-phase function.
</p>
<h2 id="rank-phase-statistics">Rank phase statistics</h2>
<p>
Use <a href="../reference/schemas/schemas.html#match-features">match-features</a>
and <a href="../reference/schemas/schemas.html#summary-features">summary-features</a>
to export detailed match- and rank-information per query.
This requires post-processing and aggregation in an external system for analysis.
</p>
<p>
To evaluate how well the document corpus matches the queries,
use <a href="../reference/schemas/schemas.html#mutate">mutable attributes</a>
to track how often each document survives each match- and ranking-phase.
This is aggregated per document and makes it easy to analyse using the query and grouping APIs in Vespa -
and no other processing/storage is required.
</p>
<p>
A mutable attribute is a number where an operation can be executed in 4 phases:
</p>
<ol>
<li>on-match</li>
<li>on-first-phase</li>
<li>on-second-phase</li>
<li>on-summary</li>
</ol>
<p>
The common use case is to increase the value by 1 for each execution.
With this, it is easy to evaluate the document's performance to the queries,
e.g. find the documents that appear in most queries, or the ones that never matched -
run a query and order by the mutable attribute.
</p>
{% include note.html content='The mutable attributes are just counters and memory-operations only -
the values might or might not survive content node restarts.
The values cannot be compared across nodes.
Use the values to assess relative document matching and ranking performance since Vespa start' %}
<p>
This example is based on the album-recommendation sample application
(see <a href="../basics/deploy-an-application.html">deploying an application</a>).
It uses 4 attributes that each track how many times a document participates in any of the 4 phases.
This is tracked only if using rank-profile <code>rank_albums_track</code> in the query:
</p>
<pre>
schema music {
document music {
field artist type string {
indexing: summary | index
}
field album type string {
indexing: summary | index
}
field year type int {
indexing: summary | attribute
}
field category_scores type tensor<float>(cat{}) {
indexing: summary | attribute
}
}
field match_count type long {
indexing: attribute | summary
attribute: mutable
}
field first_phase_count type long {
indexing: attribute | summary
attribute: mutable
}
field second_phase_count type long {
indexing: attribute | summary
attribute: mutable
}
field summary_count type long {
indexing: attribute | summary
attribute: mutable
}
fieldset default {
fields: artist, album
}
rank-profile rank_albums inherits default {
first-phase {
expression: sum(query(user_profile) * attribute(category_scores))
}
second-phase {
expression: attribute(year)
rerank-count: 1
}
summary-features: attribute(year)
}
rank-profile rank_albums_track inherits rank_albums {
mutate {
on-match { match_count += 1 }
on-first-phase { first_phase_count += 1 }
on-second-phase { second_phase_count += 1 }
on-summary { summary_count += 1 } # this only happens when summary-features are present!
}
}
rank-profile rank_albums_reset_on_match inherits rank_albums {
mutate {
on-match { match_count = 0 }
}
}
rank-profile rank_albums_reset_on_first_phase inherits rank_albums {
mutate {
on-match { first_phase_count = 0 }
}
}
rank-profile rank_albums_reset_on_second_phase inherits rank_albums {
mutate {
on-match { second_phase_count = 0 }
}
}
rank-profile rank_albums_reset_on_summary inherits rank_albums {
mutate {
on-match { summary_count = 0 }
}
}
}
</pre>
<pre>
$ vespa query \
"select * from music where album contains 'head'" \
"ranking=rank_albums_track"
</pre>
<h3 id="usage">Usage</h3>
<p>
The framework is flexible in use; the normal use case is:
</p>
<ol>
<li>
Reset the mutable attribute on all content nodes -
use <a href="../reference/api/query.html#model.searchpath">searchPath</a>
to make sure all nodes are reset by sending a query using a rank profile that resets the value.
For each phase, run a query that <em>matches</em> all documents, and reset the attribute - e.g.:
<pre>
$ for phase in match first_phase second_phase summary; do \
for node in {0..3}; do vespa query \
"select * from music where true" \
"ranking=rank_albums_reset_on_$phase" \
"model.searchPath=$node/0"; \
done \
done
</pre>
Alternatively, run a query against a group
and verify that <a href="../reference/querying/default-result-format.html">coverage</a> is 100%.
<!-- ToDo: add note on what to do for the other mutable attributes -->
</li>
<li>
Run query load, using the tracking rank-profile, like <code>rank_albums_track</code> above
</li>
<li>
Run queries using <a href="../reference/querying/sorting-language.html">sorting</a> or <a href="../querying/grouping.html">grouping</a>
on the mutable attributes.
</li>
</ol>
{% include note.html content='Make sure that only the relevant query load uses the tracking rank profile.
E.g. exclude monitoring queries / automation by using a separate ranking profile.' %}
<p>
To initialize a mutable attribute with a different value than 0 when a document is PUT, use:
</p>
<pre>
field match_count type long {
indexing: 7 | to_long | attribute | summary # Initialized to 7 for a new document. The default is 0.
attribute: mutable
}
</pre>
<p>
To dump values fast, from memory only (assuming the schema has an <code>id</code> field):
</p>
<pre>
document-summary rank_phase_statistics {
summary id {}
summary match_count {}
summary first_phase_count {}
summary second_phase_count {}
summary summary_count {}
}
</pre>
<pre>
$ vespa query \
"select * from music where true" \
"presentation.summary=rank_phase_statistics"
</pre>