Skip to content

Add_2025_Gretzinger_SlavicPeriod#303

Merged
nevrome merged 30 commits intoposeidon-framework:masterfrom
denisazlamalova:add_2025_Gretzinger_SlavicPeriod
Oct 15, 2025
Merged

Add_2025_Gretzinger_SlavicPeriod#303
nevrome merged 30 commits intoposeidon-framework:masterfrom
denisazlamalova:add_2025_Gretzinger_SlavicPeriod

Conversation

@denisazlamalova
Copy link
Copy Markdown
Contributor

@denisazlamalova denisazlamalova commented Oct 1, 2025

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • Samples that already have been published previously, and got re-analysed (e.g. re-sequenced) for the now packaged publication, have a modified Poseidon_ID of the form <Original Poseidon_ID>_<Initials of the main author>_<Year>. Re-analysed versions of I1685 (Lazaridis et al. 2016) should, for example, be assigned the IDs I1685_IL22 (Lazaridis et al. 2022) and I1685_IL25 (Lazaridis et al. 2025).

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • Genotype has been provided by the original authors of the publication describing the data.
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
  • The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@denisazlamalova denisazlamalova changed the title Add_2025_Gretzinger_Sslavic period Add_2025_Gretzinger_SlavicPeriod Oct 1, 2025
@nevrome
Copy link
Copy Markdown
Member

nevrome commented Oct 1, 2025

Thank you for preparing this package! At a first quick glance I see the following minor issues:

  • The Y_Haplogroup .janno column has the incorrect name Y_HaplogroupSource_Tissue.
  • The genotype files have pretty technical names Gretzinger_et_al.250924_ALL with an extra period character, which may be an issue on some operating systems. Maybe we could rename the files to 2025_Gretzinger_SlavicPeriod.bed/bim/fam.

@Tlkhi thankfully agreed to provide a more thorough review 🙏

@nevrome nevrome requested a review from ltcrod October 13, 2025 14:38
@nevrome
Copy link
Copy Markdown
Member

nevrome commented Oct 13, 2025

For the time being @ltcrod has offered to provide a (second) review. Thank you 👍
Please see the review guide here: https://www.poseidon-adna.org/#/archive_reviewer_guide

@Tlkhi
Copy link
Copy Markdown
Contributor

Tlkhi commented Oct 13, 2025

Thank you for submitting this package, and sorry for my late review.

Here are my comments:

– janno file:

The following columns should be added and filled in correctly:
Date_Type, Date_Note.

For Date_Type, use Contextual for non–C14-dated samples and C14 for carbon-dated samples.
If the lab codes and uncalibrated dates are missing, please note this in the Date_Note column.
If they are available, provide them in the following columns:
Date_C14_Labnr, Date_C14_Uncal_BP, Date_C14_Uncal_BP_Err.

It would also be better to replace the archaeological dates of carbon-dated samples with their calibrated C14 dates.
For example, sample VEM003 is C14 dated to 702–879 CE, but in the janno file, the archaeological date 650–900 CE is provided instead

I see some identical samples mentioned in the supplementary table,
so please re-add the following columns and fill them in correctly:
Relation_To, Relation_Degree, Relation_Type.

A Note column can also be added to mention assessment results (e.g., QUESTIONABLE, PASS).

– ssf file:

Links/values for fastq_aspera, fastq_bytes, fastq_md5, fastq_ftp, read_count, and submitted_ftp are missing.

The rest of the files look good to me.

Thankfully, @ltcrod will also provide a second review

added relevant information
updated sequencingSourceFileChkSum
edited sequencingSourceFileChkSum
sequencingSourceFileChkSum
Updated dating and relatedness information
jannoFileChkSum
@denisazlamalova
Copy link
Copy Markdown
Contributor Author

Thank you for your comments!

  • I fixed the name of the Y_Haplogroup column
  • I renamed the genotype files to 2025_Gretzinger_SlavicPeriod.bed/bim/fam
  • I edited the dating columns, specifically I added Date_Type and Date_Note where I indicated that the lab codes and uncalibrated dates are not available. The dates of individuals that were C14 dated were substituted with these more precise dates.
  • I indicated the duplicated individuals in columns Relation_To, Relation_Degree and Relation_Type
  • I added the Note column with quality information from the supplements
  • I provided astq_aspera, fastq_bytes, fastq_md5, fastq_ftp, read_count, and submitted_ftp for the .ssf file

@nevrome
Copy link
Copy Markdown
Member

nevrome commented Oct 15, 2025

Thanks for the review, @Tlkhi, and thanks for addressing it promptly, @denisazlamalova. It looks OK for me now. Will merge.

@nevrome nevrome merged commit 4f0a090 into poseidon-framework:master Oct 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants