Add 2025_Saag_NorthPontic by Tlkhi · Pull Request #299 · poseidon-framework/community-archive

Tlkhi · 2025-09-03T05:06:51Z

PR Checklist for a new package submission

The package does not exist already in the community archive, also not with a different name.
The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
The package is stored in a directory that is named like the package title.

Samples that already have been published previously, and got re-analysed (e.g. re-sequenced) for the now packaged publication, have a modified Poseidon_ID of the form <Original Poseidon_ID>_<Initials of the main author>_<Year>. Re-analysed versions of I1685 (Lazaridis et al. 2016) should, for example, be assigned the IDs I1685_IL22 (Lazaridis et al. 2022) and I1685_IL25 (Lazaridis et al. 2025).

The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
The .janno file does not include any empty columns or columns only filled with n/a.
The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

The package passes a validation with trident validate --fullGeno.

Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

martynamolak · 2025-09-10T12:20:34Z

Thanks @Tlkhi for submitting this!
Here are my comments:

janno file:

why did you add "_LS25" to each Poseidon_ID? Is this some sort of new convention in which there is an Individual_ID plus analysis instance info plus enrichment-related suffix? It might be something to discuss but I though that at present only reanalyzed samples would get such an identifier... Were these samples published before?
I see you have taken geo locations from the paper's supplementary text and in many places it differs from the one provided in the supplementary table of the paper. I am in no position to judge which of the coordinates are more relevant to particular samples. But for example for site Bilsk hillfort, supplementary table 1 provides a specific place within a site for each sample and it does make sense they have different coordinates (they are all near each other though). However for Maslyny for example the location from the supplementary text seems to make more sense than the one from supp. table. I'm not sure how to tackle this other than contacting Lehti directly about it (unless you @Tlkhi already have verified these).
For Petrykiv, however, in 2/3 samples there is a mistake with Lat being used in both Lat and Lon fields
As much as I agree that your Group_Name labels are more informative than the original ones from the paper (e.g. "Ukraine_Maslyny_EIA_LateScythian_Nomad.SG" in your package vs. "UkrEIA_LateScythian_Cri_Nom" in Saag25 paper), I think Poseidon actually aims (at least as far as I understand it) to match the labels used in the original publication. It is of course a good idea to try and systematize Pop labels across all packages somehow, but I'm not sure what Poseidon's policy exactly is here. @nevrome?!
Actually, this is a sentence from the Poseidon revewer's guide: "Are the primary group/population names in Group_Name as in the original publication? Group_Name is a ;-separated list column, so alternative names (e.g. from the AADR) can be given as well, just not in the first position."
So it looks like Poseidon would actually prefer you to provide the original Group_Name from the publication and only after a ";" add the "upgraded" Group_Name as a secondary name
With relatives detected between packages (here there are three such cases), it is not obvious how to tackle the aim of the relationships to be reported symmetrically, as older packages will not display these until they are specifically updated for this information. In these cases also the Group_Name field is missing the info on the found relatedness which normally helps excluding close relatives from popgen analyses. This issue is going to grow with samples being reanalyzed and sites revisited with further analyses. Not sure how/whether we want to deal with it @nevrome.

I don't have any comments to other files as they look all good.

Tlkhi · 2025-09-10T13:10:20Z

Thanks @Tlkhi for submitting this! Here are my comments:

janno file:

why did you add "_LS25" to each Poseidon_ID? Is this some sort of new convention in which there is an Individual_ID plus analysis instance info plus enrichment-related suffix? It might be something to discuss but I though that at present only reanalyzed samples would get such an identifier... Were these samples published before?

I see you have taken geo locations from the paper's supplementary text and in many places it differs from the one provided in the supplementary table of the paper. I am in no position to judge which of the coordinates are more relevant to particular samples. But for example for site Bilsk hillfort, supplementary table 1 provides a specific place within a site for each sample and it does make sense they have different coordinates (they are all near each other though). However for Maslyny for example the location from the supplementary text seems to make more sense than the one from supp. table. I'm not sure how to tackle this other than contacting Lehti directly about it (unless you @Tlkhi already have verified these).

For Petrykiv, however, in 2/3 samples there is a mistake with Lat being used in both Lat and Lon fields

As much as I agree that your Group_Name labels are more informative than the original ones from the paper (e.g. "Ukraine_Maslyny_EIA_LateScythian_Nomad.SG" in your package vs. "UkrEIA_LateScythian_Cri_Nom" in Saag25 paper), I think Poseidon actually aims (at least as far as I understand it) to match the labels used in the original publication. It is of course a good idea to try and systematize Pop labels across all packages somehow, but I'm not sure what Poseidon's policy exactly is here. @nevrome?!
Actually, this is a sentence from the Poseidon revewer's guide: "Are the primary group/population names in Group_Name as in the original publication? Group_Name is a ;-separated list column, so alternative names (e.g. from the AADR) can be given as well, just not in the first position."
So it looks like Poseidon would actually prefer you to provide the original Group_Name from the publication and only after a ";" add the "upgraded" Group_Name as a secondary name

With relatives detected between packages (here there are three such cases), it is not obvious how to tackle the aim of the relationships to be reported symmetrically, as older packages will not display these until they are specifically updated for this information. In these cases also the Group_Name field is missing the info on the found relatedness which normally helps excluding close relatives from popgen analyses. This issue is going to grow with samples being reanalyzed and sites revisited with further analyses. Not sure how/whether we want to deal with it @nevrome.

I don't have any comments to other files as they look all good.

Thank you for your comments,

I added the _LS25 suffix because many of these IDs are the same as those in MattilaCommBio2023, and this could cause conflicts or confusion in the future
I remember taking the locations/sites/latitude/longitude from the supplementary materials because they seemed more accurate than the ones in the supplementary tables.
Right - Thanks, I'll fix it

nevrome · 2025-09-14T19:48:49Z

Thanks for this package submission, @Tlkhi, and thanks for the prompt review, @martynamolak! To quickly address some points:

This is a new convention for the community archive, which we only documented in the checklist so far

Samples that already have been published previously, and got re-analysed (e.g. re-sequenced) for the now packaged publication, have a modified Poseidon_ID of the form . Re-analysed versions of I1685 (Lazaridis et al. 2016) should, for example, be assigned the IDs I1685_IL22 (Lazaridis et al. 2022) and I1685_IL25 (Lazaridis et al. 2025).

Sooner or later we'll have to introduce a list of the special conventions for the community-archive 🤔

Hm - I see that the names you assigned are more informative, @Tlkhi. But I think Martyna is right in that we should stick to the rules and give the author-provided ones priority. Fortunately we can have multiple group names.
This is something to discuss beyond this particular package. The way relationships are encoded in a Poseidon package does not scale well. We have talked about it already a number of times, but did not arrive at a good solution yet.

Tlkhi · 2025-09-14T21:54:04Z

Hm - I see that the names you assigned are more informative, @Tlkhi. But I think Martyna is right in that we should stick to the rules and give the author-provided ones priority. Fortunately we can have multiple group names.

thank you for your comments,
I don’t think adding incomplete labels to the group name would be useful - it wouldn’t really add any value

nevrome · 2025-09-24T14:13:15Z

Table 1 of the paper features the short versions of the group labels prominently. I think it is important that the direct link to this published analysis is maintained in the package.

But I understand if you're exhausted by this request, @Tlkhi. I can offer to do the adjustment myself in the next couple of weeks, or find somebody who's willing to do it.

nevrome · 2025-12-08T16:36:49Z

I fixed the wrong coordinates and added the analysis labels used in the publication as secondary group names. Will merge now.

Add 2025_Saag_NorthPontic

3830816

martynamolak self-assigned this Sep 8, 2025

nevrome mentioned this pull request Oct 1, 2025

Add 2024_Antonio_HighMobility #302

Merged

22 tasks

nevrome added 2 commits December 8, 2025 17:01

fixed Petrykiv coordinates as pointed out by @martynamolak

f60068f

added the analysis labels as used in the publication

4f573ca

nevrome merged commit 69b0716 into poseidon-framework:master Dec 8, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 2025_Saag_NorthPontic#299

Add 2025_Saag_NorthPontic#299
nevrome merged 3 commits intoposeidon-framework:masterfrom
Tlkhi:add_2025_Saag_NorthPontic

Tlkhi commented Sep 3, 2025 •

edited

Loading

Uh oh!

martynamolak commented Sep 10, 2025

Uh oh!

Tlkhi commented Sep 10, 2025 •

edited

Loading

Uh oh!

nevrome commented Sep 14, 2025

Uh oh!

Tlkhi commented Sep 14, 2025

Uh oh!

nevrome commented Sep 24, 2025

Uh oh!

nevrome commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Tlkhi commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist for a new package submission

Uh oh!

martynamolak commented Sep 10, 2025

Uh oh!

Tlkhi commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nevrome commented Sep 14, 2025

Uh oh!

Tlkhi commented Sep 14, 2025

Uh oh!

nevrome commented Sep 24, 2025

Uh oh!

nevrome commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tlkhi commented Sep 3, 2025 •

edited

Loading

Tlkhi commented Sep 10, 2025 •

edited

Loading