Hi,
I quickly tested your tool and observed that the Method A (majority vote with percent identity cut-off) doesn't give expected results as stated in the README. All results in lca_method.tsv indicate the species name rather than the taxon names (reflecting kingdom,phylum,class,order,family & genus taxonomy ranks) as informed by the percent identity cutoff. I made this testing on the conda installation of your package and below are the command-lines that I had used.
cat test_zotus.fasta | parallel --gnu -j 100 --recstart '>' -N 10 --pipe blastn -task blastn -query - -db /home/antonycp/tools/ncbi-blast-2.3.0+/bin/nt -outfmt \'6 qseqid sseqid pident length mismatch gapopen evalue bitscore staxid\' -qcov_hsp_perc 70 -max_target_seqs 10 > test_zotus_coverage_blastn_maxtarget10_screenresults.txt
Nucleotide-Nucleotide BLAST 2.12.0+ was used above
conda activate conda_blastmining_env/
blastMining vote -i test_zotus_coverage_blastn_maxtarget10_screenresults.txt -o vote_method -e 0.001 -txl 99,97,95,90,85,80,75 -n 10 -sm 'Sample' -j 100 -p lca_method -kp -rm
head -n10 vote_method/lca_method.tsv
qseqid Kingdom Phylum Class Order Family Genus Species
Zotu1680 k__Eukaryota p__Rhodophyta c__Florideophyceae o__Ceramiales f__Dasyaceaeg__Heterosiphonia s__Heterosiphonia sp. 1densiuscula
Zotu1625 k__Eukaryota p__Chordata c__Actinopteri o__ f__Centropomidae g__Lates s__Lates calcarifer
Zotu933 k__Eukaryota p__Chordata c__Actinopteri o__Scombriformes f__Scombridae g__Thunnus s__Thunnus albacares
Zotu1317 k__Eukaryota p__Annelida c__Clitellata o__Crassiclitellata f__Megascolecidae g__Pontodrilus s__Pontodrilus litoralis
Zotu791 k__Eukaryota p__Arthropoda c__Hexanauplia o__Harpacticoida f__Canthocamptidae g__Australocamptus s__Australocamptus hamondi
Zotu1561 k__Eukaryota p__Chordata c__Actinopteri o__Pempheriformes f__Lateolabracidae g__Lateolabrax s__Lateolabrax maculatus
Zotu1611 k__Eukaryota p__Arthropoda c__Insecta o__Diptera f__Phoridae g__Megaselia s__Megaselia sp. BOLD-2016
Zotu942 k__Eukaryota p__Arthropoda c__Hexanauplia o__Calanoida f__Paracalanidae g__Paracalanus s__Paracalanus aculeatus
Zotu958 k__Eukaryota p__Mollusca c__Gastropoda o__Pteropoda f__Cymbuliidae g__Corolla s__Corolla spectabilis
I'm attaching here the input files (in .txt format so as to comply github upload rules)
test_zotus_coverage_blastn_maxtarget10_screenresults.txt
test_zotus.fasta.txt
and the output files I got
lca_method.summary.txt
lca_method.tsv.txt
Also, is there any way to get the taxonomy ids in the final output files? I'm sure many users would really require this info. Thank you very much in advance!
Hi,
I quickly tested your tool and observed that the Method A (majority vote with percent identity cut-off) doesn't give expected results as stated in the README. All results in
lca_method.tsvindicate the species name rather than the taxon names (reflecting kingdom,phylum,class,order,family & genus taxonomy ranks) as informed by the percent identity cutoff. I made this testing on the conda installation of your package and below are the command-lines that I had used.cat test_zotus.fasta | parallel --gnu -j 100 --recstart '>' -N 10 --pipe blastn -task blastn -query - -db /home/antonycp/tools/ncbi-blast-2.3.0+/bin/nt -outfmt \'6 qseqid sseqid pident length mismatch gapopen evalue bitscore staxid\' -qcov_hsp_perc 70 -max_target_seqs 10 > test_zotus_coverage_blastn_maxtarget10_screenresults.txtNucleotide-Nucleotide BLAST 2.12.0+ was used above
conda activate conda_blastmining_env/blastMining vote -i test_zotus_coverage_blastn_maxtarget10_screenresults.txt -o vote_method -e 0.001 -txl 99,97,95,90,85,80,75 -n 10 -sm 'Sample' -j 100 -p lca_method -kp -rmhead -n10 vote_method/lca_method.tsvI'm attaching here the input files (in .txt format so as to comply github upload rules)
test_zotus_coverage_blastn_maxtarget10_screenresults.txt
test_zotus.fasta.txt
and the output files I got
lca_method.summary.txt
lca_method.tsv.txt
Also, is there any way to get the taxonomy ids in the final output files? I'm sure many users would really require this info. Thank you very much in advance!