Thanks for this awesome library.
I am curious to know whether rank_bm25 can handle 500K documents. Each document has around 1000 words.
Looking forward to your feedback. I want to use the following functionality with rank_bm25:
from rank_bm25 import BM25Okapi
corpus = [
"Hello there good man!",
"It is quite windy in London",
"How is the weather today?"
]
tokenized_corpus = [doc.split(" ") for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)
query = "windy London"
tokenized_query = query.split(" ")
doc_scores = bm25.get_scores(tokenized_query)
result = bm25.get_top_n(tokenized_query, corpus, n=1)
print(result)
Thanks for this awesome library.
I am curious to know whether rank_bm25 can handle 500K documents. Each document has around 1000 words.
Looking forward to your feedback. I want to use the following functionality with rank_bm25: