Skip to content

Multiple language annotation support #31372

@buinauskas

Description

@buinauskas

Describe the bug

The language annotation is applied once even though multiple ones are provided and as a result, the search query is stemmed just once.

To Reproduce

Schema:

schema items {
    document items {
        field language type string {
            indexing: set_language | summary | attribute
            attribute {
                fast-access
                fast-search
            }
            rank: filter
        }

        field title type string {
            indexing: summary | index
            match: text
        }
    }

    fieldset default {
        fields: title
    }
}

services.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<services version="1.0">

  <admin version="2.0">
  </admin>

  <container id="default" version="1.0">
    <search/>
    <document-api/>
  </container>

  <content id="content" version="1.0">
    <redundancy>1</redundancy>
    <documents>
      <document type="items" mode="index"/>
    </documents>
  </content>

</services>

Vespa search request:

{
    "yql": "select * from items where ({language: 'fr', grammar: 'all'}userInput(@q)) or ({language: 'en', grammar: 'all'}userInput(@q))",
    "q": "machine learning",
    "trace.level": 3
}

By inspecting traces, I can see only a single trace telling that both of the query operators were stemmed using French

{
  "message": "Stemming with language=FRENCH"
}

When I swap the order of languages, it would stem only with English and both of the query operators would be stemmed using English:

{
  "message": "Stemming with language=ENGLISH"
}

Expected behavior

Language query annotation applied per operator basis.

Environment (please complete the following information):

  • OS: Docker
  • Infrastructure: Localhost
  • Versions n/a

Vespa version
8.308.26

Additional context

This can be implemented using searchers but this can be challenging for non-engineers, especially data scientists who usually know Python really well, but not Java.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions