Hello,
Could you give an example of the format required (like how it is actually written) for the character-level tokenised data input, for example, in the file '/data/gtdb_imgpr/pretraining_data_gtdb_imgpr/data_gtdb_imgpr_train_text_CharLevelTokenizer_document' given inside the config file 'configs/launcher-test/data_configs/opengenome.yml'?
It is not immediately clear to me from the code.
Thanks
Hello,
Could you give an example of the format required (like how it is actually written) for the character-level tokenised data input, for example, in the file '/data/gtdb_imgpr/pretraining_data_gtdb_imgpr/data_gtdb_imgpr_train_text_CharLevelTokenizer_document' given inside the config file 'configs/launcher-test/data_configs/opengenome.yml'?
It is not immediately clear to me from the code.
Thanks