DeepPavlov-Chinese-KBQA/api_doc.md at main · svjack/DeepPavlov-Chinese-KBQA

Function Documentation

ner_model.py:
A self-trained NER model that extract entities and properties from Chinese questions. It use adapter-hub's adapter-transformers on NER downstream task. The E-TAG indicate entity type and T-TAG as properties.

tmp_classifier.py:
A self-trained BaggingClassifier that use MLP as BaseModel to classify Chinese question into 5 classes (defined in abcde_dict), with a multilanguage encoder (named LaBSE) to encode the text into dense space. Use Bagging because the 5 classes is unbalanced (also with some sampling)

ranker.py:
A self-trained BaggingClassifier that use MLP as BaseModel to classify Chinese question into 2 classes. This task similar with CrossEncoder in sentence-transformers, make pair input as (chinese_question, property_representation), train a 0-1 classifier to find the highest score pair, that the property_representation represent the question reasonable. This may indicate the evidence about the answer that satisfy the ask intent.

kbqa_step.py:
Main script that perform the KBQA task.

function definitions:

search_entity_rep_by_lang_filter_in_db:
find language representations of a wikidataId by setting the language flag (support en and zh) in a pre-build sqlite database, this DB can be analogy to the translate dictionary of entities cross English and Chinese.

Zh_Rel_Ranker:
definition of above ranker object

query_parser_bu, find_top_rels_bu:
main part of query process in DeepPavlov

t3_statement_df:
perform a SPARQL query inquiry on the wikidata hdt file and represent the conclusion as a [n, 3] shaped pandas dataframe (with columns named with s p o, where s p o is the basic Triad collection in Knowledge Base)

fix_o:
a toolkit that fix some problem when transform the stream made by hdt query iterator's o field when collect this stream to a local Ntriple file.

py_dumpNtriple:
transform on row of s p o made by hdt query iterator to Ntriple file format.

one_part_g_producer:
init a knowledge graph object with the help of rdflib

drop_duplicates_by_col:
a toolkit that drop the duplicates of a pandas dataframe by unify the value of one column

drop_duplicates_of_every_df:
a toolkit that drop the duplicates of a pandas dataframe of any dtypes (this function is useful when some cells in dataframe not have hashcode : e.x. List)

search_triples_with_parse:
perform a SPARQL query inquiry on the wikidata hdt file

perm_top_sort:
find the similar text from a collection of list compared with another text by cos distance between SentenceTransformer text encodings.

syn_sim_on_list:
find the similar text from a collection of list compared with another text by distance defined by synonyms (text only maintain Chinese parts)

t3_statement_ranking, choose_tmp_by_ranking:
use ranker to find a reasonable s p o ranking conclusion between Chinese question and the many s p o collections.

till_process_func:
some SPARQL part have some decorate such as 'FILTER (?x = a ).' so the s p o will be expand to s p o f. This function filter out the part we only careful.

fill_str, for_loop_detect:
decode BIO style conclusion from NER model to a dictionary with [E-TAG T-TAG O-TAG] as keys and list of elements as values.

ner_entity_type_predict:
use adapter-transfomers to extract entities and properties of a Chinese question.

keyword_rule_filter:
a rule based fix on the output of tmp_classifier. In the definition, every question with "多大" as its sub-span will drop the "COUNT" style SPARQL template.

tmp_type_predict:
use tmp_classifier to classify Chinese question into 5 templates defined in abcde_dict.

property_df_rep_disambiguation:
disambiguate different properties on question.

do_search:
The main function that input the Chinese question and output the query conclusion from wikidata hdt Knowledge Base.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function Documentation

function definitions:

FilesExpand file tree

api_doc.md

Latest commit

History

api_doc.md

File metadata and controls

Function Documentation

function definitions: