Skip to content

Update of 2022_GnecchiRuscone_CarpathianBasin to Poseidon v3.0.0#318

Merged
nevrome merged 5 commits intomasterfrom
v300test
Mar 23, 2026
Merged

Update of 2022_GnecchiRuscone_CarpathianBasin to Poseidon v3.0.0#318
nevrome merged 5 commits intomasterfrom
v300test

Conversation

@nevrome
Copy link
Copy Markdown
Member

@nevrome nevrome commented Mar 19, 2026

This is meant to be a test for Poseidon v3.0.0. I made the package compliant with the new standard, but I only operated with the information that was already there. Please check the changelog for the exact changes I applied. I hope these rather minimal adjustments are in line with your vision for the package, @gagr88.

@TCLamnidis: Could you please comment on what I entered here?

referenceGenomeAssembly: GRCh37
referenceGenomeAssemblyURL: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.13

The authors say the following in the paper:

The reads were then mapped to the Human Reference Genome Hs37d5 with bwa v0.7.12 aln/samse alignment algorithm (Li and Durbin, 2009) with the parameters “-n” and “-l” set to 0.01 and 32 respectively.

I wonder if referenceGenomeAssembly should rather be Hs37d5 or maybe GRCh37.p13. And I'm also unsure about the URL.

PR Checklist for modifying one or multiple existing packages

  • The changes maintain the structural integrity of the affected packages.
  • The checksums of the modified files in the respective POSEIDON.yml files were adjusted properly.
  • Every file in the submission is correctly referenced in the relevant POSEIDON.yml files and there are no additional, supplementary files in the submission that are not documented there.

  • The packageVersion numbers of the affected packages were increased in their POSEIDON.yml files.
  • The changes in the packageVersion followed the Poseidon Package versioning policy.
  • The changes were documented in the respective CHANGELOG files. If no CHANGELOG files existed previously it was added here.
  • The lastModified fields of the affected POSEIDON.yml files were updated.
  • The contributor fields were updated with name, email and orcid of the relevant, new contributors.
  • The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

  • All affected packages pass a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@nevrome nevrome requested review from TCLamnidis and stschiff March 19, 2026 08:22
@nevrome nevrome changed the title Update 2022_GnecchiRuscone_CarpathianBasin to Poseidon v3.0.0 Update of 2022_GnecchiRuscone_CarpathianBasin to Poseidon v3.0.0 Mar 19, 2026
@nevrome
Copy link
Copy Markdown
Member Author

nevrome commented Mar 19, 2026

Ah - and I'm also not 100% sure about the scaling of the Damage and Contamination values. I'm pretty sure they are given as proportions, not as percent. So they were entered with the wrong scaling for Poseidon v2.7.1.

Copy link
Copy Markdown
Member

@stschiff stschiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all correct to me, including damage and contamination. Thanks

@stschiff
Copy link
Copy Markdown
Member

I've also tested this now with trident serve, seems to work!

@TCLamnidis
Copy link
Copy Markdown
Member

TCLamnidis commented Mar 20, 2026

From looking into ensembl, the assembly url for hs37d5 should be https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000001405.14 (which redirects to the URL you provide anyway, so not a hill I'll die on).

hs37d5 is just a patched version of the GRCh37 genome assembly, so I think GRCh37 is the correct value for referenceGenomeAssembly. Papers in which HG19 and/or GRCh37 were used as the reference genomes would also get the same values.

@stschiff
Copy link
Copy Markdown
Member

But there's this whole business of "Chr1" (GRCh37 vs. "1" (hs37d5)... doesn't it matter for that? @TCLamnidis

@TCLamnidis
Copy link
Copy Markdown
Member

TCLamnidis commented Mar 20, 2026

As long as the assembly is the same, the genomic coordinates should be the same. As such, whether the chromosome includes the chr prefix is an archive decision, I'd say.

@nevrome
Copy link
Copy Markdown
Member Author

nevrome commented Mar 23, 2026

Thank you for your help - I will merge this then.

@nevrome nevrome merged commit 22d5a76 into master Mar 23, 2026
1 check passed
@nevrome nevrome deleted the v300test branch March 23, 2026 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants