Hi, when starting to parse the data the following issues caught my attention:
- Location information such as "migrants_collected_in_..." or "... sampled from Great Britain" appears to not match the other type of locations in the data sets. Also some locations use '_' instead of ' ', which should probably be unified
- There are sites of the format "SiteName_...BCE", which I (as someone outside the field) think does not directly belong to the site
- Some Genetic_Source_Accession_ID list entries appear to be split by "/" or " / " instead of ";"
- The format in Primary_Contact is inconsistent, e.g., name surname or surname, name. Some entries even use a comma to separate list entries. Also, entries like "Allentoft, Morten merged with Haak, Wolfgang" should be using a ";" to separate the different persons
Hi, when starting to parse the data the following issues caught my attention: