Skip to content

Add 2022_KumarScience_Xinjiang#252

Merged
nevrome merged 12 commits intomasterfrom
add_2022_KumarScience_Xinjiang
Apr 15, 2025
Merged

Add 2022_KumarScience_Xinjiang#252
nevrome merged 12 commits intomasterfrom
add_2022_KumarScience_Xinjiang

Conversation

@stschiff
Copy link
Copy Markdown
Member

@stschiff stschiff commented Feb 13, 2025

This is a take-over of #206 by @ainashch. A first review by @nevrome was:

  • The package name does not follow our expected standard of Year_AuthorName_RelevantKeyword. I propose 2022_Kumar_Xinjiang.
  • Please remove all columns that are completely empty/filled only with n/a.
  • The Relation_To column works with the Alternative_IDs, not the Poseidon_IDs. Is there a reason why there are two sample naming schemes existing in parallel? Why did you opt for the alternative one for the Relation_To column? I think there are multiple possible solutions to this.
  • It seems you used a combination of Relation_Degree == first + Relation_Type == identical to express that two samples are from the same individual. This is not necessary. Relation_Degree can be set to identical directly.
  • There is a site called G218 - just to make sure: This is a proper site name?
  • The last sample has the Site set to Unknown. I think it would be better to put it to n/a.
  • The Date_Type should be set to contextual for contextual ages. Date_Note then does not need the redundant *Date contextual (what does the * mean?).
  • Date_BC_AD_Median can be computed as the mean of Date_BC_AD_Start and Date_BC_AD_Stop for contextual ages.
  • The Publication column is typically used for a bibtex key in a complete package. In this .janno-only submission we can leave it like it is for now.

@stschiff
Copy link
Copy Markdown
Member Author

I was able to download the genotype data for this package from a platform in China to which the author had uploaded it.

@stschiff
Copy link
Copy Markdown
Member Author

Oookay, so I've gone through some of the review points. Some quick remarks on them:

  1. I've added genotype data from the authors. They came with a more fine-grained group labeling, so I added those as first group to the Janno, but kept the more coarse-grained one that @ainashch added from the Supplement.
  2. I've fixed the issue about the "identical" relationships as flagged by @nevrome.
  3. I've not fixed the issue that the Relationships are currently not given in terms of Poseidon_IDs but in terms of Alternative_IDs. I would hope that perhaps @ainashch could help with this?
  4. Yes, the site's name appears to be "G218", so I've kept that.
  5. The date information is a mess, but @ainashch has indeed dutifully filled all we have. We don't have uncalibrated dates or errors, just calibrated ones. In some cases they seem to be indirect, and in some cases published elsewhere. I've made clearer notes to indicate this. I would leave the dates as they are now, including the fact that for some dates we just have a point estimate (entered in the median column), and for others we have boundaries, but no median. I don't see a good way around that for now.
  6. I've added bibliographic information.

@stschiff
Copy link
Copy Markdown
Member Author

So the one task left to do is fixing the relationships in terms of Poseidon_IDs. I think we need to make do this with a short script and some lookup table to exchange Alternative and Poseidon_IDs. @ainashch do you think you can perhaps just download the Janno file from this PR, work on this and send the fixed one back to me so I can included it?

@stschiff stschiff marked this pull request as draft February 13, 2025 08:28
@stschiff stschiff self-assigned this Feb 13, 2025
@nevrome nevrome changed the title Add 2022 kumar science xinjiang Add 2022_KumarScience_Xinjiang Feb 13, 2025
@nevrome
Copy link
Copy Markdown
Member

nevrome commented Mar 13, 2025

@ainashch Are you available to have a look at this? Otherwise maybe @Kavlahkaff could take over. Please let us know, so that we can make a decision how to proceed with this package draft.

@stschiff
Copy link
Copy Markdown
Member Author

stschiff commented Apr 4, 2025

It would be great if @Kavlahkaff could take over, please. There is really only one todo left, which is to fix the Related_To column, which currently contains the Alternative IDs, not the Poseidon IDs. Perhaps you can write a little script that looks those up and replaces them?

@stschiff
Copy link
Copy Markdown
Member Author

stschiff commented Apr 8, 2025

Some observations today:

  • What is currently in "Alternative_ID" is actually a redundant entry of Library_Name. Please double-check and remove.
  • The Excel table lists a "Individual ID", which was in the Janno put as "Collection_ID". But it should really be the "Alternative_ID". Some of these have an asterisk which should be removed. There may not actually be a Collection_ID at all.
  • The Supplement also lists this as a note the Asterisk:
    Same individual sequenced in Zhang et al. 2021
    AYTH_M22B
    /C2034 | AYIM22BY
    AYTH_M22C*/C2035 | AYIM22BN
    G218_M5_2*/C3339 | G218M5-2

So please add those as identical, even if we do not have a package for Zhang et al. 2021.

  • The Relation_To column currently incorrectly lists Library Names. They should be replaced by the Poseidon_IDs.

@stschiff
Copy link
Copy Markdown
Member Author

stschiff commented Apr 8, 2025

And could you please add the Zhang et al. 2021 paper (see https://www.poseidon-adna.org/paper-directory/ and search for Zhang Tarim) as a Minotaur recipe, @Kavlahkaff

@stschiff stschiff assigned Kavlahkaff and unassigned stschiff Apr 8, 2025
@Kavlahkaff Kavlahkaff marked this pull request as ready for review April 10, 2025 16:59
@stschiff
Copy link
Copy Markdown
Member Author

This all seems done. Ready for review.

@nevrome nevrome merged commit 0e5c338 into master Apr 15, 2025
1 check passed
@nevrome nevrome deleted the add_2022_KumarScience_Xinjiang branch April 15, 2025 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants