Skip to content

add linked sequence formats, type, tests#388

Open
colinvwood wants to merge 7 commits intoqiime2:devfrom
colinvwood:linked-sequences
Open

add linked sequence formats, type, tests#388
colinvwood wants to merge 7 commits intoqiime2:devfrom
colinvwood:linked-sequences

Conversation

@colinvwood
Copy link
Copy Markdown
Contributor

No description provided.

Comment thread q2_types/feature_data/_formats.py Outdated
Comment thread q2_types/feature_data/_formats.py Outdated
Comment thread q2_types/feature_data/_deferred_setup/_transformers.py
@colinvwood
Copy link
Copy Markdown
Contributor Author

@ebolyen I added pd.Series transformers as it looks like those are widely used downstream, and updated the parent of LinkedDNAFASTAFormat so we can += ' ' to the alphabet. This is everything I have planned for the q2-types side of things, lmk if you have ideas for anything else we need.

@colinvwood colinvwood moved this from In Development to In Review in 2026.4 🌱 Mar 31, 2026
@colinvwood colinvwood self-assigned this Mar 31, 2026
@Oddant1 Oddant1 assigned Oddant1 and unassigned colinvwood Apr 3, 2026
@Oddant1 Oddant1 self-requested a review April 3, 2026 17:33
Copy link
Copy Markdown
Member

@Oddant1 Oddant1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm. I think @ebolyen hit everything. I still need to properly get through the other PR especially the R stuff...

Comment on lines +360 to +365
with ff.open() as fh:
for id_, seq in data.items():
sequence = skbio.Sequence(
str(seq), metadata={'id': id_}, lowercase=False
)
skbio.io.write(sequence, format='fasta', into=fh)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nitpick I guess, is it beneficial to optimize this so we aren't calling write for every line? Idk how clever skbio.io.write is, but generally it seems wasteful to write each line individually instead of one big write.

Copy link
Copy Markdown
Member

@ebolyen ebolyen Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should write it as fast as any other loop, there's an internal C buffer in Python that is actually queuing up ~8kb blocks and dumping those at once (this is why .flush() exists and .tell() is mostly useless)

@Oddant1 Oddant1 moved this from In Review to In Development in 2026.4 🌱 Apr 6, 2026
@colinvwood colinvwood moved this from In Development to In Review in 2026.4 🌱 Apr 10, 2026
@lizgehret lizgehret removed this from 2026.4 🌱 Apr 23, 2026
@github-project-automation github-project-automation Bot moved this to In Development in 2026.7 🐐 Apr 23, 2026
@lizgehret lizgehret moved this from In Development to In Review in 2026.7 🐐 Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

5 participants