Skip to content

Add calculate_marker_genes component#1168

Open
lazappi wants to merge 11 commits intomainfrom
add-calculate_marker_genes
Open

Add calculate_marker_genes component#1168
lazappi wants to merge 11 commits intomainfrom
add-calculate_marker_genes

Conversation

@lazappi
Copy link
Copy Markdown
Contributor

@lazappi lazappi commented Apr 22, 2026

Changelog

Add a calculate_marker_genes component that runs scanpy.tl.rank_genes_groups followed by (optionally) scannpy.tl.filter_rank_genes_groups(). Outputs a H5MU with calculated markers as well as a CSV with marker genes.

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

Comment thread src/annotate/calculate_marker_genes/config.vsh.yaml Outdated
Comment thread src/annotate/calculate_marker_genes/config.vsh.yaml Outdated
- name: Filter rank genes groups
description: Arguments for scanpy `filter_rank_genes_groups()`
arguments:
- name: --filter_results
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wish to align with other components, I think we used do_subset for this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change the name but I think this is doing something different. It doesn't remove/mask vars before the test, instead it runs a second function filter_rank_genes_groups() which applies some filters to the testing results. No vars are removed at any stage.

There is a masking argument to rank_genes_groups() but it's not implemented in the component.

Comment thread src/annotate/calculate_marker_genes/config.vsh.yaml Outdated
Comment thread src/annotate/calculate_marker_genes/script.py Outdated
logger.info("Using .X matrix")

logger.info("Using '%s' method", par["method"])
sc.tl.rank_genes_groups(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method from scanpy automatically log-normalize the input layer when it does not have the correct items set in uns? Otherwise we might need to do something different here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function doesn't do anything to the input but it will give a warning if it thinks the data is not log-normalised (i.e. if it is all non-negative integers).

- name: Outputs
description: Arguments that define the output
arguments:
- name: --output
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which slots are added to the output anndata? If there are none, lets remove this argument. If there are, please allow setting the slot name(s) using arguments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results are stored in .uns. There is already a key_added argument that matches the function argument which controls the name. Happy to rename it if you like.

lazappi added 6 commits May 4, 2026 16:25
…genes

* origin/main:
  Improve defaults for annotation workflows (#1155)
  Update file examples (#1067)
  Bump viashpy to 0.10.0 (#1178)
  Add `cellmapper` outputs (#1177)
  Fix failing filtering test after implementing obs intersection (#1175)
  CI - linting: pin R version to 4.5.3 (#1181)
  Fix broken scvelo test by excluding version 0.3.4 (#1180)
  Leiden: avoid making unnecessary copies of the output data and add extra arguments (#1132)
  Add `cellmapper` component (#1169)
  Bump anndata to 0.12.11 (#1174)
  Add optional .obs intersection (#1173)
  Add consensus_vote component (#1151)
  Add `clear_slots` component (#1171)
@lazappi lazappi requested a review from DriesSchaumont May 5, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants