Skip to content

ENH: new type + format for BLASTDB v5 #275

@nbokulich

Description

@nbokulich

Addition Description
Using BLAST indexed databases would allow faster searching as well as parallelization (e.g., see use of blastn in q2-feature-classifier, also used in q2-quality-control and some other plugins). This would be in addition to the current use of FASTA (in which case the indexed db is built on the fly).

Current Behavior
No blastdb formats are supported.

Proposed Behavior

  1. Create a new type (proposed name: BLASTDB)
  2. Create a new format (proposed: BLASTDBv5Format). There are versioned formats (v4 is deprecated, v5 is current for some time).
    3. Create transformers (to/from FASTAformat(s)) (EDIT: these should probably be actions, not transformers, and could be placed in q2-feature-classifier re: Wrap makeblastdb to generate indexed blast database q2-feature-classifier#158)

One problem is that the format specification does not appear to be described anywhere that I can find. So I am not sure that we can write a detailed format validator. However, format validation could use blastdbcmd (ships with blast+) to get db info, like:
blastdbcmd -db -info

Questions

  1. Do we need separate types for protein vs. nucleotide dbs? The format would be the same but linked to different types.
  2. Is blastdbcmd sufficient for validation?
  3. Is this even a type that we want in q2-types? A few plugins would wind up using this (q2-feature-classifier, q2-moshpit, external plugins?) so I personally feel certain enough to open the issue here first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions