Skip to content

Add 2025_SkourtaniotiJia_SCaucasus#293

Merged
nevrome merged 14 commits intoposeidon-framework:masterfrom
xiaowenjia:master
Sep 8, 2025
Merged

Add 2025_SkourtaniotiJia_SCaucasus#293
nevrome merged 14 commits intoposeidon-framework:masterfrom
xiaowenjia:master

Conversation

@xiaowenjia
Copy link
Copy Markdown
Contributor

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • Samples that already have been published previously, and got re-analysed (e.g. re-sequenced) for the now packaged publication, have a modified Poseidon_ID of the form <Original Poseidon_ID>_<Initials of the main author>_<Year>. Re-analysed versions of I1685 (Lazaridis et al. 2016) should, for example, be assigned the IDs I1685_IL22 (Lazaridis et al. 2022) and I1685_IL25 (Lazaridis et al. 2025).

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • Genotype has been provided by the original authors of the publication describing the data.
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
  • The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@nevrome
Copy link
Copy Markdown
Member

nevrome commented Aug 5, 2025

Thank you for this submission!

The validation currently fails with some encoding-related issue in https://github.com/poseidon-framework/community-archive/blob/master/checkFileEncoding.sh. Either the .janno file is not utf8-encoded, or it has windows line endings.

@xiaowenjia
Copy link
Copy Markdown
Contributor Author

@nevrome thank you so much for the quick review! Now I fixed the line endings!

@nevrome
Copy link
Copy Markdown
Member

nevrome commented Aug 6, 2025

OK - this looks ready for review! But I think the paper is not out yet, right? The bibtex entry is still missing a doi. I assume we should wait with merging the package until everything is properly released.

Could you review, @martynamolak? You have offered to do so in the past, so that's why I'm asking you directly. Feel free to decline the request if this comes at a busy time 👍! The review guide is available here.

@martynamolak
Copy link
Copy Markdown
Contributor

Hey, if noone else signed up for that yet, I can do it, but probably only early next week. If that is fine than count me in.

@nevrome
Copy link
Copy Markdown
Member

nevrome commented Aug 8, 2025

That is excellent! Thank you very much, Martina!

@xiaowenjia
Copy link
Copy Markdown
Contributor Author

@nevrome thank you for arranging the review and @martynamolak thank you for reviewing it! I have updated the doi information so there should be no problem with merging the package when the review is done:)

added month to the references
@martynamolak
Copy link
Copy Markdown
Contributor

martynamolak commented Aug 11, 2025

yml file:

  1. Shouldn't @xiaowenjia be rather listed as contributor? But up to you of course

bib file:

  1. There is supposed to be a month for the publications listed according to the reviewer's documentation. Could you provide these? (I added them in the file myself now)

janno file:

  1. in Alternative_IDs you use "/" which I'm afraid might be problematic for some processing pipelines. I would suggest changing to "-"; especially since there are already some indivs (eg. "KMR001-NOK002_unresolved_origin") that use "-" rather than "/"; @nevrome or are "/" in Alternative_IDs fine by Poseidon? (Poseidon_ID and Group_Name are free of these)
  2. Relation_To: when you list more than one indiv here, you should probably separate them using ";" (now you have "," in some cases and ";" in some others); also not sure how that works when you have individuals related to individuals that are not included in this package (eg. "geo005"). Poseidon_IDs are probably not screened for uniqueness across packages so it might be tricky to identify which individual (from which packages) you are referring to; perhaps you could refer to the source package for the individual in the Relation_Note field.
  3. Relation_Degree: my feeling is that when you list a number of individuals in "Relation_to", each one should have a corresponding "Relation_Degree" value (you have that in most cases but the ATK individuals only have one Relation_Degree despite having many Relation_to listed); there are "thirdTofifth" Relation_Degree values with a note "BREAD", which is probably a mistake as BREADR does not infer such relations
  4. Relation_Note: change "BREAD" to "BREADR"; why do some individuals have "n/a" in this field (it's probably fine, but is a bit weird)?
  5. Location: fix the "Samtskhe–Javakheti" non-UTF-8 character
  6. Date_C14_Labnr: do the lab numbers really are digit-only? Usually there is some lab identifier (eg. "MAMS-45448" or "Poz-84950")
  7. Date_BC_AD_Median: for DZN009_ss this median value falls outside the Start-Stop range
  8. Contamination_Note: I suggest removing the "Estimate and error are weighted means of values per library. Libraries with fewer than 100 were excluded." part as it is not specific to the given individual
  9. gur017_ES25 and geo015_ES25 individuals are missing all the lab processing information; I realize they come from a different paper, but might be helpful to add if possible; are these indivs a part of another package as well?
  10. KHT005 (excluded) individual - it would probably be good to change his Group_Name to "GEO_EarlyMiddleAges_excluded" or something (like the "_rel" individuals do); otherwise it might not be picked up as the repeated individual by pipelines (I suppose since it was excluded from the analyses in the original paper, it did not use the provided group name anyway so congruence with the paper should not be an issue here)

@nevrome
Copy link
Copy Markdown
Member

nevrome commented Aug 19, 2025

Thanks for this thorough review, @martynamolak. As always excellent! And thanks for even editing the .bib file directly.
Some comments:

  1. There are no special requirements for Alternative_IDs so far. Internal consistency would be good, though.
  2. Poseidon_IDs are screend for uniqueness on an archive level. So it's fine to reference individuals in other packages of the community-archive. An additional entry in the Relation_Note column is not necessary.

Beyond that:

  1. List columns should not include spaces between entries, so BREAD;BREAD instead of BREAD; BREAD.

Please address all 13 points in your revision, @xiaowenjia. I'm happy to help with any additional open questions.

@xiaowenjia
Copy link
Copy Markdown
Contributor Author

Thank you both very much! I will look into them!

@xiaowenjia
Copy link
Copy Markdown
Contributor Author

xiaowenjia commented Aug 31, 2025

Hi @nevrome and @martynamolak,
Regarding the points:
1 added myself, thx!
2 Thank you, @martynamolak!
3-6 fixed
7 It seems like "Samtskhe–Javakheti" to me, is it now all good or should I remove the "-"?
8-10 fixed
11 We filtered the transversion-only site directly on the eigentrat file; the original lab processing information can be found in the package "2023_Koptekin_SouthwestAsia". It might be confusing if we copy them here. I added the filtered snp coverage here to these two samples.
12 KHT005 is not included in this package. I removed the relevant info as well
13 fixed

@nevrome
Copy link
Copy Markdown
Member

nevrome commented Sep 8, 2025

OK - I applied some more minor tweaks to the .janno file. Most notably, the Source column should be called Publication. I'll merge now.

@nevrome nevrome merged commit 0a50618 into poseidon-framework:master Sep 8, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants