Add 2025_SkourtaniotiJia_SCaucasus by xiaowenjia · Pull Request #293 · poseidon-framework/community-archive

xiaowenjia · 2025-08-05T15:32:22Z

PR Checklist for a new package submission

The package does not exist already in the community archive, also not with a different name.
The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
The package is stored in a directory that is named like the package title.

Samples that already have been published previously, and got re-analysed (e.g. re-sequenced) for the now packaged publication, have a modified Poseidon_ID of the form <Original Poseidon_ID>_<Initials of the main author>_<Year>. Re-analysed versions of I1685 (Lazaridis et al. 2016) should, for example, be assigned the IDs I1685_IL22 (Lazaridis et al. 2022) and I1685_IL25 (Lazaridis et al. 2025).

The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
The .janno file does not include any empty columns or columns only filled with n/a.
The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

The package passes a validation with trident validate --fullGeno.

Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

nevrome · 2025-08-05T16:18:11Z

Thank you for this submission!

The validation currently fails with some encoding-related issue in https://github.com/poseidon-framework/community-archive/blob/master/checkFileEncoding.sh. Either the .janno file is not utf8-encoded, or it has windows line endings.

xiaowenjia · 2025-08-05T18:53:05Z

@nevrome thank you so much for the quick review! Now I fixed the line endings!

nevrome · 2025-08-06T13:05:23Z

OK - this looks ready for review! But I think the paper is not out yet, right? The bibtex entry is still missing a doi. I assume we should wait with merging the package until everything is properly released.

Could you review, @martynamolak? You have offered to do so in the past, so that's why I'm asking you directly. Feel free to decline the request if this comes at a busy time 👍! The review guide is available here.

martynamolak · 2025-08-07T14:38:27Z

Hey, if noone else signed up for that yet, I can do it, but probably only early next week. If that is fine than count me in.

nevrome · 2025-08-08T13:50:26Z

That is excellent! Thank you very much, Martina!

xiaowenjia · 2025-08-08T13:54:55Z

@nevrome thank you for arranging the review and @martynamolak thank you for reviewing it! I have updated the doi information so there should be no problem with merging the package when the review is done:)

added month to the references

martynamolak · 2025-08-11T09:43:01Z

yml file:

Shouldn't @xiaowenjia be rather listed as contributor? But up to you of course

bib file:

There is supposed to be a month for the publications listed according to the reviewer's documentation. Could you provide these? (I added them in the file myself now)

janno file:

in Alternative_IDs you use "/" which I'm afraid might be problematic for some processing pipelines. I would suggest changing to "-"; especially since there are already some indivs (eg. "KMR001-NOK002_unresolved_origin") that use "-" rather than "/"; @nevrome or are "/" in Alternative_IDs fine by Poseidon? (Poseidon_ID and Group_Name are free of these)
Relation_To: when you list more than one indiv here, you should probably separate them using ";" (now you have "," in some cases and ";" in some others); also not sure how that works when you have individuals related to individuals that are not included in this package (eg. "geo005"). Poseidon_IDs are probably not screened for uniqueness across packages so it might be tricky to identify which individual (from which packages) you are referring to; perhaps you could refer to the source package for the individual in the Relation_Note field.
Relation_Degree: my feeling is that when you list a number of individuals in "Relation_to", each one should have a corresponding "Relation_Degree" value (you have that in most cases but the ATK individuals only have one Relation_Degree despite having many Relation_to listed); there are "thirdTofifth" Relation_Degree values with a note "BREAD", which is probably a mistake as BREADR does not infer such relations
Relation_Note: change "BREAD" to "BREADR"; why do some individuals have "n/a" in this field (it's probably fine, but is a bit weird)?
Location: fix the "Samtskheâ€“Javakheti" non-UTF-8 character
Date_C14_Labnr: do the lab numbers really are digit-only? Usually there is some lab identifier (eg. "MAMS-45448" or "Poz-84950")
Date_BC_AD_Median: for DZN009_ss this median value falls outside the Start-Stop range
Contamination_Note: I suggest removing the "Estimate and error are weighted means of values per library. Libraries with fewer than 100 were excluded." part as it is not specific to the given individual
gur017_ES25 and geo015_ES25 individuals are missing all the lab processing information; I realize they come from a different paper, but might be helpful to add if possible; are these indivs a part of another package as well?
KHT005 (excluded) individual - it would probably be good to change his Group_Name to "GEO_EarlyMiddleAges_excluded" or something (like the "_rel" individuals do); otherwise it might not be picked up as the repeated individual by pipelines (I suppose since it was excluded from the analyses in the original paper, it did not use the provided group name anyway so congruence with the paper should not be an issue here)

nevrome · 2025-08-19T12:45:26Z

Thanks for this thorough review, @martynamolak. As always excellent! And thanks for even editing the .bib file directly.
Some comments:

There are no special requirements for Alternative_IDs so far. Internal consistency would be good, though.
Poseidon_IDs are screend for uniqueness on an archive level. So it's fine to reference individuals in other packages of the community-archive. An additional entry in the Relation_Note column is not necessary.

Beyond that:

List columns should not include spaces between entries, so BREAD;BREAD instead of BREAD; BREAD.

Please address all 13 points in your revision, @xiaowenjia. I'm happy to help with any additional open questions.

xiaowenjia · 2025-08-21T21:57:05Z

Thank you both very much! I will look into them!

xiaowenjia · 2025-08-31T23:04:27Z

Hi @nevrome and @martynamolak,
Regarding the points:
1 added myself, thx!
2 Thank you, @martynamolak!
3-6 fixed
7 It seems like "Samtskhe–Javakheti" to me, is it now all good or should I remove the "-"?
8-10 fixed
11 We filtered the transversion-only site directly on the eigentrat file; the original lab processing information can be found in the package "2023_Koptekin_SouthwestAsia". It might be confusing if we copy them here. I added the filtered snp coverage here to these two samples.
12 KHT005 is not included in this package. I removed the relevant info as well
13 fixed

nevrome · 2025-09-08T09:34:08Z

OK - I applied some more minor tweaks to the .janno file. Most notably, the Source column should be called Publication. I'll merge now.

xiaowenjia added 5 commits August 1, 2025 14:16

added a first draft of 2025_SkourtaniotiJiaCell_SCaucasus

9f390a6

added a first draft of 2025_SkourtaniotiJia_SCaucasus

8e97b8d

Remove file_name.ext

22d4f9d

Remove file_name.ext

19c28e5

added a first draft of 2025_SkourtaniotiJia_SCaucasus

a25037c

Fix line endings in 2025_SkourtaniotiJia_SCaucasus.janno

5a47e85

xiaowenjia added 2 commits August 5, 2025 22:25

Fix jannoFileChkSum in POSEIDON.yml

ec7e544

Fix bibFile in POSEIDON.yml

95da59c

Add doi info to the bibFile

5fccf96

Update 2025_SkourtaniotiJia_SCaucasus.bib

6bb767f

added month to the references

xiaowenjia added 3 commits September 1, 2025 00:47

addressed the review points

c8ab751

using the original ChkSum of geno and snp file

bbbc0a4

fixed point 13

05c916d

minor changes and fixes in the .janno file

3dc5856

nevrome merged commit 0a50618 into poseidon-framework:master Sep 8, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 2025_SkourtaniotiJia_SCaucasus#293

Add 2025_SkourtaniotiJia_SCaucasus#293
nevrome merged 14 commits intoposeidon-framework:masterfrom
xiaowenjia:master

xiaowenjia commented Aug 5, 2025

Uh oh!

nevrome commented Aug 5, 2025

Uh oh!

xiaowenjia commented Aug 5, 2025

Uh oh!

nevrome commented Aug 6, 2025

Uh oh!

martynamolak commented Aug 7, 2025

Uh oh!

nevrome commented Aug 8, 2025

Uh oh!

xiaowenjia commented Aug 8, 2025

Uh oh!

martynamolak commented Aug 11, 2025 •

edited by nevrome

Loading

Uh oh!

nevrome commented Aug 19, 2025

Uh oh!

xiaowenjia commented Aug 21, 2025

Uh oh!

xiaowenjia commented Aug 31, 2025 •

edited

Loading

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xiaowenjia commented Aug 5, 2025

PR Checklist for a new package submission

Uh oh!

nevrome commented Aug 5, 2025

Uh oh!

xiaowenjia commented Aug 5, 2025

Uh oh!

nevrome commented Aug 6, 2025

Uh oh!

martynamolak commented Aug 7, 2025

Uh oh!

nevrome commented Aug 8, 2025

Uh oh!

xiaowenjia commented Aug 8, 2025

Uh oh!

martynamolak commented Aug 11, 2025 • edited by nevrome Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nevrome commented Aug 19, 2025

Uh oh!

xiaowenjia commented Aug 21, 2025

Uh oh!

xiaowenjia commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nevrome commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

martynamolak commented Aug 11, 2025 •

edited by nevrome

Loading

xiaowenjia commented Aug 31, 2025 •

edited

Loading