Skip to content

Method A doesn't give expected results #2

@Anto007

Description

@Anto007

Hi,

I quickly tested your tool and observed that the Method A (majority vote with percent identity cut-off) doesn't give expected results as stated in the README. All results in lca_method.tsv indicate the species name rather than the taxon names (reflecting kingdom,phylum,class,order,family & genus taxonomy ranks) as informed by the percent identity cutoff. I made this testing on the conda installation of your package and below are the command-lines that I had used.

cat test_zotus.fasta | parallel --gnu -j 100 --recstart '>' -N 10 --pipe blastn -task blastn -query - -db /home/antonycp/tools/ncbi-blast-2.3.0+/bin/nt -outfmt \'6 qseqid sseqid pident length mismatch gapopen evalue bitscore staxid\' -qcov_hsp_perc 70 -max_target_seqs 10 > test_zotus_coverage_blastn_maxtarget10_screenresults.txt
Nucleotide-Nucleotide BLAST 2.12.0+ was used above
conda activate conda_blastmining_env/
blastMining vote -i test_zotus_coverage_blastn_maxtarget10_screenresults.txt -o vote_method -e 0.001 -txl 99,97,95,90,85,80,75 -n 10 -sm 'Sample' -j 100 -p lca_method -kp -rm

head -n10 vote_method/lca_method.tsv

qseqid	Kingdom	Phylum	Class	Order	Family	Genus	Species
Zotu1680	k__Eukaryota	p__Rhodophyta	c__Florideophyceae	o__Ceramiales	f__Dasyaceaeg__Heterosiphonia	s__Heterosiphonia sp. 1densiuscula
Zotu1625	k__Eukaryota	p__Chordata	c__Actinopteri	o__	f__Centropomidae	g__Lates	s__Lates calcarifer
Zotu933	k__Eukaryota	p__Chordata	c__Actinopteri	o__Scombriformes	f__Scombridae	g__Thunnus	s__Thunnus albacares
Zotu1317	k__Eukaryota	p__Annelida	c__Clitellata	o__Crassiclitellata	f__Megascolecidae	g__Pontodrilus	s__Pontodrilus litoralis
Zotu791	k__Eukaryota	p__Arthropoda	c__Hexanauplia	o__Harpacticoida	f__Canthocamptidae	g__Australocamptus	s__Australocamptus hamondi
Zotu1561	k__Eukaryota	p__Chordata	c__Actinopteri	o__Pempheriformes	f__Lateolabracidae	g__Lateolabrax	s__Lateolabrax maculatus
Zotu1611	k__Eukaryota	p__Arthropoda	c__Insecta	o__Diptera	f__Phoridae	g__Megaselia	s__Megaselia sp. BOLD-2016
Zotu942	k__Eukaryota	p__Arthropoda	c__Hexanauplia	o__Calanoida	f__Paracalanidae	g__Paracalanus	s__Paracalanus aculeatus
Zotu958	k__Eukaryota	p__Mollusca	c__Gastropoda	o__Pteropoda	f__Cymbuliidae	g__Corolla	s__Corolla spectabilis

I'm attaching here the input files (in .txt format so as to comply github upload rules)
test_zotus_coverage_blastn_maxtarget10_screenresults.txt
test_zotus.fasta.txt

and the output files I got
lca_method.summary.txt
lca_method.tsv.txt

Also, is there any way to get the taxonomy ids in the final output files? I'm sure many users would really require this info. Thank you very much in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions