Skip to content

added documentation#21

Open
Aashu-Adhikari wants to merge 3 commits intodorianbrown:masterfrom
Aashu-Adhikari:bm25okapi
Open

added documentation#21
Aashu-Adhikari wants to merge 3 commits intodorianbrown:masterfrom
Aashu-Adhikari:bm25okapi

Conversation

@Aashu-Adhikari
Copy link
Copy Markdown

added inline comments and docstrings to explain what the code is actually doing.

Copy link
Copy Markdown

@bhattbhuwan13 bhattbhuwan13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the suggested changes

Comment thread rank_bm25.py
if tokenizer:
corpus = self._tokenize_corpus(corpus)

nd = self._initialize(corpus)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is nd here? You should explain it

Comment thread rank_bm25.py
Comment on lines +38 to +40
Example:
corpus = [['ram', 'is', 'a', 'good', 'boy'], ['ram', 'does', 'cycling', 'and', 'racing'], ['ram', 'is', 'healthy'], ['rita', 'likes', 'shyam'], ['good', 'luck']]
nd = {'ram': 3, 'is': 2, 'a': 1, 'good': 2, 'boy': 1, 'does': 1, 'cycling': 1, 'and': 1, 'racing': 1, 'healthy': 1, 'rita': 1, 'likes': 1, 'shyam': 1, 'luck': 1}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten the examples so that I don't need to scroll. The functionality can also be explained only using 2 items in the list.

Comment thread rank_bm25.py
for document in corpus:
self.doc_len.append(len(document))
num_doc += len(document)
num_words += len(document) # total number of words in whole corpus
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function of variable num_words has already been explained.

Comment thread rank_bm25.py
frequencies = {}
term_frequencies = (
{}
) # term frequency of each word in a document........ changed frequencies to term_frequencies
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to comment that you changed the name of variable. git keeps track of it.

Comment thread rank_bm25.py
Comment on lines +53 to +54
if word not in term_frequencies:
term_frequencies[word] = 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code can be removed by using defaultdict instead of the normal dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants