documentation/en/reference/writing/indexing-language.html at 3b3f076a7e736c181fa7413e8d2de29ac8511403 · vespa-engine/documentation · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
---
# Copyright Vespa.ai. All rights reserved.
title: "Indexing language reference"
redirect_from:
- /en/reference/advanced-indexing-language.html
- /en/reference/indexing-language-reference.html
---

<p>
This reference documents the full Vespa <em>indexing language</em>.
If more complex processing of input data is required, implement a
<a href="../../applications/document-processors.html">document processor</a>.
</p><p>
The indexing language is analogous to UNIX pipes, in that statements
consists of expressions separated by the <i>pipe</i> symbol
where the output of each expression is the input of the next.
Statements are terminated by semicolon and are independent of each
other (except when using variables).
</p>
<p>Find examples in the <a href="/en/writing/indexing.html">indexing</a> guide.</p>


<h2 id="indexing-script">Indexing script</h2>
<p>
An indexing script is a sequence of <a href="#indexing-statement">indexing
statements</a> separated by a semicolon (<code>;</code>). A script is
executed statement-by-statement, in order, one document at a time.
</p><p>
Vespa derives one indexing script per search cluster based on the search
definitions assigned to that cluster. As a document is fed to a search
cluster, it passes through the corresponding
<a href="../applications/services/content.html#document-processing">indexing cluster</a>, which runs the
document through its indexing script.
Note that this also happens whenever the document is
<a href="../../operations/reindexing.html">reindexed</a>, so expressions such as <a href="#now">now</a> must
be thought of as the time the document was (last) <em>indexed</em>, not when it was <em>fed</em>.
</p><p>
You can examine the indexing script generated for a specific search
cluster by retrieving the configuration of the indexing document processor.
</p>
<pre>
$ vespa-get-config -i search/cluster.&lt;cluster-name&gt; -n vespa.configdefinition.ilscripts
</pre>
<p>
The current <em>execution value</em> is set to <code>null</code> prior to
executing a statement.
</p>


<h2 id="indexing-statement">Indexing statement</h2>
<p>
An indexing statement is a sequence of <a href="#indexing-expression">indexing
expressions</a> separated by a pipe (<code>|</code>).
A statement is executed expression-by-expression, in order.
</p><p>
Within a statement, the execution value is passed from one expression to the next.
</p><p>
  The simplest of statements passes the value of an input field into an attribute:
</p>
<pre class="code">
input year | attribute year;
</pre>
<p>
The above statement consists of 2 expressions; <code>input year</code> and
<code>attribute year</code>. The former sets the execution value to the
value of the "year" field of the input document. The latter writes the
current execution value into the attribute "year".
</p>


<h2 id="indexing-expression">Indexing expression</h2>


<h3 id="primitives">Primitives</h3>

<p>
A string, numeric literal and true/false can be used as an expression to explicitly
  set the execution value. Examples: <code>"foo"</code>, <code>69</code>, <code>true</code>).
</p>

<h3 id="outputs">Outputs</h3>
<p>
An output expression is an expression that writes the current execution
value to a document field. These expressions also double as the indicator
for the type of field to construct (i.e. attribute, index or summary). It
is important to note that you <u>can not</u> assign different values to
the same field in a single document (e.g. <code>attribute | lowercase |
index</code> is <strong>illegal</strong> and will not deploy).
</p>
<table class="table">
  <thead>
  <tr>
    <th>Expression</th>
    <th>Description</th>
  </tr>
  </thead><tbody>
  <tr>
    <td><code>attribute</code></td>
    <td>
      Writes the execution value to the current field. During deployment,
      this indicates that the field should be stored as an attribute.
    </td>
  </tr>
  <tr>
    <td><code>index</code></td>
    <td>
      Writes the execution value to the current field. During deployment,
      this indicates that the field should be stored as an index field.
    </td>
  </tr>
  <tr>
    <td><code>summary</code></td>
    <td>
      Writes the execution value to the current field. During deployment,
      this indicates that the field should be included in the document
      summary.
    </td>
  </tr>
  </tbody>
</table>


<h3 id="arithmetics">Arithmetics</h3>
<p>
Indexing statements can contain any combination of arithmetic operations,
as long as the operands are numeric values. In case you need to convert
from string to numeric, or convert from one numeric type to another,
use the applicable <a href="#converters">converter</a> expression.
The supported arithmetic operators are:
</p>
<table class="table">
  <thead>
  <tr>
    <th>Operator</th>
    <th>Description</th>
  </tr>
  </thead><tbody>
  <tr>
    <td style="white-space: nowrap;"><code>&lt;lhs&gt; + &lt;rhs&gt;</code></td>
    <td>
      Sets the execution value to the result of adding of the execution
      value of the <code>lhs</code> expression with that of
      the <code>rhs</code> expression.
    </td>
  </tr>
  <tr>
    <td><code>&lt;lhs&gt; - &lt;rhs&gt;</code></td>
    <td>
      Sets the execution value to the result of subtracting of the execution
      value of the <code>lhs</code> expression with that of
      the <code>rhs</code> expression.
    </td>
  </tr>
  <tr>
    <td><code>&lt;lhs&gt; * &lt;rhs&gt;</code></td>
    <td>
      Sets the execution value to the result of multiplying of the execution
      value of the <code>lhs</code> expression with that of
      the <code>rhs</code> expression.
    </td>
  </tr>
  <tr>
    <td><code>&lt;lhs&gt; / &lt;rhs&gt;</code></td>
    <td>
      Sets the execution value to the result of dividing of the execution
      value of the <code>lhs</code> expression with that of
      the <code>rhs</code> expression.
    </td>
  </tr>
  <tr>
    <td><code>&lt;lhs&gt; % &lt;rhs&gt;</code></td>
    <td>
      Sets the execution value to the remainder of dividing the execution
      value of the <code>lhs</code> expression with that of
      the <code>rhs</code> expression.
    </td>
  </tr>
  <tr>
    <td><code>&lt;lhs&gt; . &lt;rhs&gt;</code></td>
    <td>
      Sets the execution value to the concatenation of the execution value
      of the <code>lhs</code> expression with that of the <code>rhs</code>
      expression. If <em>both</em> <code>lhs</code> and <code>rhs</code> are
      collection types, this operator will append <code>rhs</code>
      to <code>lhs</code> (if any operand is null, it is treated as an empty
      collection). If not, this operator concatenates the string
      representations of <code>lhs</code> and <code>rhs</code> (if any
      operand is null, the result is null).
    </td>
  </tr>
  </tbody>
</table>
<p id="parenthesis">
  You may use parenthesis to declare precedence of execution (e.g. <code>(1
  + 2) * 3</code>). This also works for more advanced array concatenation
  statements such as <code>(input str_a | split ',') . (input str_b | split
  ',') | index arr</code>.
</p>


<h3 id="converters">Converters</h3>

<p>These expressions let you convert from one data type to another.</p>

<table class="table">
  <thead>
  <tr>
    <th>Converter</th>
    <th>Input</th>
    <th>Output</th>
    <th>Description</th>
  </tr>
  </thead><tbody>
<tr>
  <td><code>binarize [threshold]</code></td>
  <td>Any tensor</td>
  <td>Any tensor</td>
  <td>
    <p id="binarize">
      Replaces all values in a tensor by 0 or 1.
      This takes an optional argument specifying the threshold a value needs to be larger than to be
      replaced by 1 instead of 0. The default threshold is 0.
      This is useful to create a suitable input to <a href="#pack_bits">pack_bits</a>.
    </p>
  </td>
</tr>
<tr>
  <td><code>embed [id] [args]</code></td>
  <td>String</td>
  <td>A tensor</td>
  <td><p id="embed">Invokes an <a href="../../rag/embedding.html">embedder</a> to convert a text to one or more vector embeddings.
    The type of the output tensor is what is required by the following expression (as supported by the specific embedder).
    Arguments are given space separated, as in <code>embed colbert chunk</code>.
    The first argument and can be omitted when only a single embedder is configured.
    Any additional arguments are passed to the embedder implementation.
    If the same chunk expression with the same input occurs multiple times in a schema,
    its value will only be computed once.
  </p>
  </td>
</tr>
<tr>
  <td><code>chunk id [args]</code></td>
  <td>String</td>
  <td>A tensor</td>
  <td><p id="chunk">Invokes a which convert a string into an array of strings.
      Arguments are given space separated, as in <code>chunk fixed-length 512</code>.
      The id of the chunker to use is required and can be a chunker bundled with Vespa,
      or any chunker component added in the services.xml, see the
      <a href="../rag/chunking.html">chunking reference</a>.
      Any additional arguments are passed to the chunker implementation.
      If the same chunk expression with the same input occurs multiple times in a schema,
      its value will only be computed once.
    </p>
  </td>
</tr>
<tr>
  <td><code>hash</code></td>
  <td>String</td>
  <td>int or long</td>
  <td><p id="hash">Converts the input to a hash value (using SipHash).
      The hash will be int or long depending on the target field.</p></td>
</tr>
<tr>
    <td><code>pack_bits</code></td>
    <td>A tensor</td>
    <td>A tensor</td>
    <td>
      <p id="pack_bits">
        Packs the values of a binary tensor into bytes with 1 bit per value in big-endian order.
        The input tensor must have a single dense dimension.
        It can have any value type and any number of sparse dimensions.
        Values that are not 0 or 1 will be binarized with 0 as the threshold.
      </p>
      <p>The output tensor will have:</p>
      <ul>
        <li><code>int8</code> as the value type.</li>
        <li>The dense dimension size divided by 8 (rounded upwards to integer).</li>
        <li>The same sparse dimensions as before.</li>
      </ul>
      <p>
      The resulting tensor can be unpacked during ranking using
      <a href="../ranking/ranking-expressions.html#unpack-bits">unpack_bits</a>.
      A tensor can be converted to binary form suitable as input to this by the
      <a href="#binarize">binarize function</a>.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>to_array</code></td>
    <td>Any</td>
    <td>Array&lt;inputType&gt;</td>
    <td><p id="to_array">Converts the execution value to a single-element array.</p></td>
  </tr>
  <tr>
    <td><code>to_byte</code></td>
    <td>Any</td>
    <td>Byte</td>
    <td>
      <p id="to_byte">
      Converts the execution value to a byte. This will throw a
      NumberFormatException if the string representation of the execution
      value does not contain a parseable number.</p>
    </td>
  </tr>
  <tr>
    <td><code>to_double</code></td>
    <td>Any</td>
    <td>Double</td>
    <td>
      <p id="to_double">
      Converts the execution value to a double. This will throw a
      NumberFormatException if the string representation of the execution
      value does not contain a parseable number.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>to_float</code></td>
    <td>Any</td>
    <td>Float</td>
    <td>
      <p id="to_float">
      Converts the execution value to a float. This will throw a
      NumberFormatException if the string representation of the execution
      value does not contain a parseable number.</p>
    </td>
  </tr>
  <tr>
    <td><code>to_int</code></td>
    <td>Any</td>
    <td>Integer</td>
    <td>
      <p id="to_int">
      Converts the execution value to an int. This will throw a
      NumberFormatException if the string representation of the execution
      value does not contain a parseable number.</p>
    </td>
  </tr>
  <tr>
    <td><code>to_long</code></td>
    <td>Any</td>
    <td>Long</td>
    <td>
      <p id="to_long">
      Converts the execution value to a long. This will throw a
      NumberFormatException if the string representation of the execution
      value does not contain a parseable number.</p>
    </td>
  </tr>
  <tr>
    <td><code>to_bool</code></td>
    <td>Any</td>
    <td>Bool</td>
    <td>
      <p id="to_bool">
      Converts the execution value to a boolean type.
      If the input is a string it will become true if it is not empty.
      If the input is a number it will become true if it is != 0.</p>
    </td>
  </tr>
  <tr>
    <td><code>to_pos</code></td>
    <td>String</td>
    <td>Position</td>
    <td>
      <p id="to_post">
      Converts the execution value to a position struct. The input format
      must be either a) <code>[N|S]&lt;val&gt;;[E|W]&lt;val&gt;</code>, or
      b) <code>x;y</code>.</p>
    </td>
  </tr>
  <tr>
    <td><code>to_string</code></td>
    <td>Any</td>
    <td>String</td>
    <td><p id="to_string">Converts the execution value to a string.</p></td>
  </tr>
  <tr>
    <td><code>to_uri</code></td>
    <td>String</td>
    <td>Uri</td>
    <td><p id="to_uri">Converts the execution value to a URI struct</p></td>
  </tr>
  <tr>
    <td><code>to_wset</code></td>
    <td>Any</td>
    <td>WeightedSet&lt;inputType&gt;</td>
    <td>
      <p id="to_wset">
      Converts the execution value to a single-element weighted set with
      default weight.</p>
    </td>
  </tr>
  <tr>
    <td><code>to_epoch_second</code></td>
    <td>String</td>
    <td>Long</td>
    <td>
      <p id="to_epoch_second">
      Converts an ISO-8601 instant formatted String to Unix epoch (or Unix time or POSIX time or Unix timestamp) which
      is the number of seconds elapsed since January 1, 1970, UTC. The converter uses
      <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/time/Instant.html#parse(java.lang.CharSequence)">
        java.time.Instant.parse
      </a> to parse the input string value. This will throw a DateTimeParseException if the input cannot be parsed. Examples:
      </p>
      <ul>
        <li><code>2023-12-24T17:00:43.000Z</code> is converted to <code>1703437243L</code></li>
        <li><code>2023-12-24T17:00:43Z</code> is converted to <code>1703437243L</code></li>
        <li><code>2023-12-24T17:00:43.431Z</code> is converted to <code>1703437243L</code></li>
        <li><code>2023-12-24T17:00:43.431+00:00</code> is converted to <code>1703437243L</code></li>
    </ul>
    </td>
  </tr>
  </tbody>
</table>


<h3 id="other-expressions">Other expressions</h3>
<p>
The following are the unclassified expressions available:
</p>
<table class="table">
  <thead>
  <tr>
    <th>Expression</th><th>Description</th>
  </tr>
  </thead><tbody>
  <tr>
    <td><code>_</code></td>
    <td><p id="_">
      Returns the current execution value.
      This is useful, e.g., to prepend some other value to the current execution value,
      see <a href="/en/writing/indexing.html#execution-value-example">this example</a>.
    </p></td>
  </tr>
  <tr>
    <td><code>attribute &lt;fieldName&gt;</code></td>
    <td><p id="attribute">Writes the execution value to the named attribute field.</p></td>
  </tr>
  <tr>
    <td><code>base64decode</code></td>
    <td>
      <p id="base64decode">
      If the execution value is a string, it is base-64 decoded to a long integer.
      If it is not a string, the execution value is set to <code>Long.MIN_VALUE</code>.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>base64encode</code></td>
    <td>
      <p id="base64encode">
      If the execution value is a long integer, it is base-64 encoded to a string.
      If it is not a long integer, the execution value is set to <code>null</code>.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>echo</code></td>
    <td>
      <p id="echo">Prints the execution value to standard output, for debug purposes.</p>
    </td>
  </tr>
  <tr>
    <td><code>flatten</code></td>
    <td>
      <p id="flatten"></p>
      {% include deprecated.html content='Use <a href="/en/reference/schemas/schemas.html#tokens">tokens</a>
         in the schema instead.' %}
    </td>
  </tr>
  <tr>
    <td><code>for_each { &lt;script&gt; }</code></td>
    <td>
      <p id="for_each">
      Executes the given indexing script for each element in the execution value.
      Here, element refers to each element in a collection, or each field value in a struct.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>generate [id] [args]</code></td>
    <td>
      <p id="generate">
      Invokes a <a href="../../rag/document-enrichment">field generator</a> to generate a
      field valued from an input string.
      The argument is the id of the <code>FieldGenerator</code> component as described in
      <a href="../../rag/document-enrichment">Document enrichment with LLMs</a>.
      If the same generate expression with the same input occurs multiple times in a schema,
      its value will only be computed once.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>get_field &lt;fieldName&gt;</code></td>
    <td>
      <p id="get_field">
      Retrieves the value of the named field from the execution value
      (which needs to be either a document or a struct),
      and sets it as the new execution value.
      </p>
    </td>
  </tr>
  <tr>
      <td><code>get_language</code></td>
      <td>
          <p id="get_language">
          Retrieves the code of the last assigned or detected language, ur "un" for "unknown" if no
          language has been assigned or detected. Language is detected when a string is tokenized or embedded,
          so this can be used to retrieve the language detected by a previous field executing one such operation,
          e.g. by indexing.
          </p>
      </td>
  </tr>
  <tr>
      <td><code>get_value &lt;key&gt;</code></td>
      <td>
          <p id="get_value">
          Retrieves a value from an input Map. The key can be a number, identifier or quoted string.
          If the given key does not have a value, the output is empty.
          </p>
      </td>
  </tr>
  <tr>
    <td><code>get_var &lt;varName&gt;</code></td>
    <td>
      <p id="get_var">
      Retrieves the value of the named variable from the execution context and sets it as the execution value.
      Note that variables are scoped to the indexing script of the current field.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>hex_decode</code></td>
    <td>
      <p id="hex_decode">
      If the execution value is a string, it is parsed as a long integer in base-16.
      If it is not a string, the execution value is set to <code>Long.MIN_VALUE</code>.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>hex_encode</code></td>
    <td>
      <p id="hex_encode">
      If the execution value is a long integer,
      it is converted to a string representation of an unsigned integer in base-16.
      If it is not a long integer, the execution value is set to <code>null</code>.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>hostname</code></td>
    <td>
      <p id="hostname">Sets the execution value to the name of the host computer.</p>
    </td>
  </tr>
  <tr>
    <td><code>
      if&nbsp;(&lt;left&gt;&nbsp;&lt;cmp&gt;&nbsp;&lt;right&gt;)&nbsp;{<br/>
      &nbsp;&nbsp;&nbsp;&nbsp;&lt;trueScript&gt;<br />
      }<br />
      [&nbsp;else&nbsp;{&nbsp;&lt;falseScript&gt;&nbsp;}&nbsp;]
    </code></td>
    <td>
      <p id="if">
      Executes the <code>trueScript</code> if the conditional evaluates to true,
      or the <code>falseScript</code> if it evaluates to false.
      If either <code>left</code> or <code>right</code> is null, no expression is executed.
      The value produced is the value returned from the chosen branch, and these must
      produce values of compatible types (or none).
      </p>
    </td>
  </tr>
  <tr>
    <td><code>index &lt;fieldName&gt;</code></td>
    <td><p id="index">Writes the execution value to the named index field.</p></td>
  </tr>
  <tr>
    <td><code>input &lt;fieldName&gt;</code></td>
    <td>
      <p id="input">
      Retrieves the value of the named field from the document and sets it as the execution value.
      The field name may contain '.' characters to retrieve nested struct fields.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>join "&lt;delim&gt;"</code></td>
    <td>
      <p id="join">
      Creates a single string by concatenating the string representation of
      each array element of the execution value.
      This function is useful or indexing data from a <a href="../../querying/searching-multivalue-fields">multivalue</a> field
      into a singlevalue field.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>lowercase</code></td>
    <td><p id="lowercase">Lowercases all the strings in the execution value.</p></td>
  </tr>
  <tr>
    <td><code>ngram &lt;size&gt;</code></td>
    <td>
      <p id="ngram">Adds ngram annotations to all strings in the execution value.</p>
    </td>
  </tr>
  <tr>
    <td><code>normalize</code></td>
    <td>
    <p id="normalize">
      <a href="../../linguistics/linguistics-opennlp.html#normalization">normalize</a> the input data.
      The corresponding query command for this function is <code>normalize</code>.
    </p>
    </td>
  </tr>
  <tr>
    <td><code>now</code></td>
    <td>
      <p id="now">
      Outputs the current system clock time as a UNIX timestamp,
      i.e. seconds since 0 hours, 0 minutes, 0 seconds, January 1, 1970, Coordinated Universal Time (Epoch).
      </p>
    </td>
  </tr>
  <tr>
    <td><code>random [ &lt;max&gt; ]</code></td>
    <td>
      <p id="random">
        Returns a random integer value.
        Lowest value is 0 and the highest value is determined either by the argument or,
        if no argument is given, the execution value.
      </p>
    </td>
  </tr>
  <tr>
  <td><code>
    sub-expression1 || sub-expression2 || ...
  </code></td>
  <td>
    <p id="choice">
      Returns the value of the first alternative sub-expression which returns a non-null value.
      See <a href="/en/writing/indexing.html#choice-example">this example</a>.
    </p>
  </td>
  </tr>
  <tr>
    <td><code>
      select_input&nbsp;{ <br />
      (&nbsp;case&nbsp;&lt;fieldName&gt;:&nbsp;&lt;statement&gt;;&nbsp;)* <br />
      }
    </code></td>
    <td>
      <p id="select_input">
      Performs the statement that corresponds to the <strong>first</strong> named
      field that is not empty (see <a href="/en/writing/indexing.html#select-input-example">example</a>).
      </p>
    </td>
  </tr>
  <tr>
    <td><code>set_language</code></td>
    <td>
      <p id="set_language">
      Sets the language of this document to the string representation of the execution value.
      Parses the input value as an RFC 3066 language tag,
      and sets that language for the current document.
      This affects the behavior of the <a href="../../linguistics/linguistics-opennlp.html#tokenization">tokenizer</a>.
      The recommended use is to have one field in the document containing the language code,
      and that field should be the <strong>first</strong> field in the document,
      as it will only affect the fields defined <strong>after</strong> it in the schema.
      Read <a href="../../linguistics/linguistics.html#language-handling">linguistics</a>
      for more information on how language settings are applied.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>set_var &lt;varName&gt;</code></td>
    <td>
      <p id="set_var">
      Writes the execution value to the named variable.
      Note that variables are scoped to the indexing script of the current field.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>substring &lt;from&gt; &lt;to&gt;</code></td>
    <td>
      <p id="substring">
      Replaces all strings in the execution value by a substring of the respective value.
      The arguments are inclusive-from and exclusive-to.
      Both arguments are clamped during execution to avoid going out of bounds.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>split &lt;regex&gt;</code></td>
    <td>
      <p id="split">
      Splits the string representation of the execution value into a string array using the given regex pattern.
      This function is useful for creating <a href="../../querying/searching-multivalue-fields">multivalue</a>
      fields such as an integer array out of a string of comma-separated numbers.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>summary &lt;fieldName&gt;</code></td>
    <td>
      <p id="summary">
      Writes the execution value to the named summary field. During deployment, this indicates that the field should be included in the document summary.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>
      switch&nbsp;{ <br />
      (&nbsp;case&nbsp;'&lt;value&gt;':&nbsp;&lt;caseStatement&gt;;&nbsp;)* <br />
      [&nbsp;default:&nbsp;&lt;defaultStatement&gt;;&nbsp;] <br />
      }
    </code></td>
    <td>
      <p id="switch">
      Performs the statement of the case whose value matches the string
      representation of the execution value (see <a href="/en/writing/indexing.html#switch-example">example</a>).
      </p>
    </td>
  </tr>
  <tr>
    <td><code>tokenize [ normalize ] [ stem ]</code></td>
    <td>
      <p id="tokenize">
      Adds linguistic annotations to all strings in the execution value.
      Read <a href="../../linguistics/linguistics.html">linguistics</a> for more information.
      </p>
    </td>
  </tr>
  <tr>
    <td><code>trim</code></td>
    <td>
      <p id="trim">Removes leading and trailing whitespace from all strings in the execution value.</p>
    </td>
  </tr>
  <tr>
    <td><code>uri</code></td>
    <td>
      <p id="uri">
      Converts all strings in the execution value to a URI struct.
      If a string could not be converted, it is removed.
      </p>
    </td>
  </tr>
  </tbody>
</table>