Skip to content

ucc23/updt_UCC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,475 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UCC management

The UCC lives in several repositories within the UCC23 organization and in its associated Zenodo repository.

  • The updt_ucc repository, this one, contains the scripts and data files required to update the UCC (mostly) automatically. It also contains the main files that make up the UCC: UCC_cat_B.csv, UCC_cat_C.csv, and the UCC_members.parquet file with all the identified members (stored in the zenodo/ folder).

  • The ucc repository contains for each entry in the database a corresponding entry in the form of an .md file, in the _clusters/ folder. This repository also contains all the files required to build the public site

  • The plots_X repositories contain several plots per OC in various folders: one with the four diagrams, one for Aladin, and plots for the HUNT23 and/or CANTAT20 members when available. The plots are loaded by the public site from this repository.

The UCC update process is managed through five main scripts that perform the required tasks in sequence, as described in the following sections.

1. Adding a new DB

Given a new database (DB) to be added to the UCC, the A_get_new_DB.py script manages the updating of the JSON file that contains the information for each database in the UCC, as well as downloading the Vizier database if available and/or requested:

  1. Load the current JSON database file.
  2. Check if the URL is already listed in the current database.
  3. Fetch publication authors and year from NASA/ADS
  4. Generate a new database name based on extracted metadata.
  5. Handle temporary database files and check for existing data.
  6. Fetch Vizier data or allow manual input for Vizier IDs.
  7. Match new database columns with current JSON structure.
  8. Update the JSON file and save the database as CSV.

Once the JSON file with the entry for the new DB and the generated CSV file with the DB data are stored in the temporary folder, check carefully both files before moving on as both might need manual intervention.

Input

  • ADS_bibcode: NASA/ADS bibcode for the new DB
  • data/databases_info.json: Current JSON file with info about the databases included in the UCC

Output

  • temp_updt/data/databases_info.json: Updated JSON file with new DB entry
  • temp_updt/data/NEW_DB.csv: New database in CSV format

Important: The column with the names of the new entries can contain several names; these must be separated using a ','.

Remember to carefully check the keys for the new DB in the JSON file. They are auto detected by the script and they might need to be corrected and/or filled.

The output files will be loaded and updated by the B script, they must not be manually moved.

JSON file format

The format of the databases_info.json file is mostly self-explanatory with the exception of the parameters section. This can have different formats depending on the database structure. There are three possible formats:

  1. Default

The default formatting corresponds to a single column with a given formatting/units for each parameter:

"pars": {
  "par_general": {
    "par_format": "par_db_col"
  },
  ...
}

For some databases the "par_db_col" can contain multiple values separated by a ',' or ';'. Eg: KRONBERGER2006, ROSER2016, NIZOVKINA2025

  1. Duplicate values, different units

The "par_general" parameter is present in more tan one column with different units (eg, obtained by different methods). Eg: JAEHNIG2021, CARRASCO2025

"pars": {
  "par_general": {
    "par_format_0": "par_db_col_0",
    "par_format_1": "par_db_col_1",
  },
  ...
}
  1. Duplicate units

There is more than one column with the same "par_format" units for the same parameter. Eg: CHEN2003, PISKUNOV2008, SANTOS2021, HUNT2024, ZHANG2024, HU2025

"pars": {
  "par_general": {
    "par_format": ["par_db_col_0", "par_db_col_1"],
  },
  ...
}

2. Updating the UCC: first step

The B_update_UCC script handles two possible cases:

  1. Adding a new database to the UCC
  2. Rebuilding the entire UCC

The case is decided by the script according to whether it finds a temporary databases_info.json file with new DBs to add. The final UCC_cat_B.csv file contains the following main columns:

"fnames","DB","DB_i","Names"

which are the main identifiers for each entry in the UCC. Along with these columns, the file also contains the following columns which are used only as centers to estimate membership by the next script:

"RA_ICRS","DE_ICRS","GLON","GLAT","Plx","pmRA","pmDE"

and finally the fund_pars column which contains the fundamental parameters for each entry extracted from each DB where it is present.

Input

  • temp_updt/data/databases_info.json: New JSON file (generated by script A)
  • temp_updt/data/NEW_DB.json: New database (generated by script A)
  • data/UCC_cat_B.csv: Current version of this file
  • data/databases_info.json: Current JSON database file
  • data/globulars.csv: Globular clusters data file

Output

  • UCC_cat_B.csv: Updated version of this file (old one is archived)

If new DBs were added then these files are moved from temporary folders to their final destination:

  • data/databases_info.json: Updated UCC database JSON file
  • data/NEW_DB.csv: New database in CSV format

3. Updating the UCC: second step

The C_process_member_files.py script generates the UCC_cat_C.csv file and updates the parquet file with all the identified members. It also generates the required files for uploading to the Zenodo repository (these files need to be manually uploaded to a new Zenodo release).

The UCC_cat_C.csv file contains columns that represent the following information:

- fnames: names of each entry (must match those in the B file)
- plots_used: (y/n) tells the E script if an entry plot needs to be generated/updated
- process: (y/n) manual flag that indicates whether to (re)process this entry
- N_clust: Fixed number of cluster members (has precedence over N_clust_max)
- N_clust_max: Maximum number of cluster members
- N_box: Multiplier value for the box size used to search for members
- frame_limit: A string in the format "x_111.1" where "x" is one of the characters:

  - 'b' (bottom), 't' (top), 'l' (left), 'r' (right) for (GLON, GLAT) coordinates
  - 'pmb' (bottom), 'pmt' (top), 'pml' (left), 'pmr' (right) for (pmra, pmde)
  - 'plxl' (left), 'plxr' (right) for parallax

and the numbers are the limiting values for each. A single limit can be provided (eg,
"x_111.1") or several separated by a ',' (eg, "x_111.1,y_222.2,...").

The rest of the columns are generated using information taken from the estimated members for each entry. This script perform the following main operations:

3.1. Change detection

Compares two versions of the UCC catalog: a source catalog (UCC_cat_B.csv) and the current catalog (UCC_cat_C.csv). It identifies three sets of open clusters:

  • New clusters to be added (B entry not in C --> Add to C)
  • Old clusters to be removed (C entry not in B --> Remove from C)
  • Existing clusters that have been manually marked for reprocessing (C entry with 'process=y' --> Reprocess in C)

3.2 Cluster processing

For each new or re-processed cluster, the script performs a detailed analysis:

  • Uses the fastMP method to identify likely member stars
  • Saves the list of member stars for each cluster into an individual .parquet file
  • Calculates various parameters for the cluster based on its members

3.3 Data file updating

After processing all the necessary clusters, the script updates the main UCC data:

  • Combines all the individual member .parquet files into a single, updated master members file
  • Updates the UCC_cat_C.csv file with the newly derived data, adding new clusters and removing old ones

3.4 Post-processing

  • Finds Shared Members: analyzes the entire member list to find stars that belong to more than one cluster and records this information
  • Calculates the UTI

Input

  • data/df_UCC_B_updt.csv: Current UCC database (produced by the B script)
  • data/df_UCC_C_updt.csv: Current version of this file
  • data/manual_centers.csv: Manual centers for selected OCs
  • databases/globulars.csv: Globular clusters data file
  • data/databases_info.json: Current UCC database JSON file
  • Gaia data files: Gaia data files for a given release
  • zenodo/README.txt: Zenodo README file
  • zenodo/UCC_members.parquet: File with estimated member for all the clusters (FILE NOT TRACKED BECAUSE IT IS TOO LARGE)

Output

  • data/df_UCC_C_updt.csv: Updated version (previous version is archived)
  • zenodo/README.txt: Updated version
  • zenodo/UCC_cat.csv: Updated (and simplified) version of the UCC catalog
  • zenodo/UCC_members.parquet: Updated version

The last three files are uploaded to Zenodo to generate a new release.

4. Updating the site

Updating the site requires running the D_update_UCC_site.py script.

This script applies the required changes to update the ucc.ar site. It processes the UCC catalogue and searches for modifications that need to be applied to update the site.

  • Generate/update per cluster .webp files (stored in the plots/ folders)
  • If plots were generated/updated, update the plot_used column in the data/df_UCC_C.csv file
  • Generate/update per cluster .md (stored in ucc/_clusters/) files
  • Update the split members files (ucc/assets/members/*.csv.gz)
  • Update the CSV clusters file ucc/assets/clusters.csv.gz and its associated JSON file clusters-manifest.json
  • Update the main UCC site files including tables and images
  • Move all files to their final destination
  • Check that the number of files is correct

Input

  • data/df_UCC_B.csv: Used to access certain columns like Names and DB

  • data/df_UCC_C.csv: Primary source of data for most of the website content

  • data/zenodo/UCC_members.parquet: Used for generating cluster plots

  • data/databases_info.json: Latest UCC database JSON file

  • data/databases/cmmts/*.csv: CSV files with comments

  • ucc/assets/clusters.csv.gz: JSON file with the latest UCC data

  • ucc/_pages/XXXXX.md (DATABASES, ARTICLES, TABLES)

  • ucc/_tables/XXXXX_table.md (individual table pages)

  • ucc/_tables/dbs/{db_name}.csv (one file for each original database in the UCC)

  • ucc/_clusters/{cluster_name}.md (individual cluster pages)

Output

  • UCC/plots/plots_X/*/*.webp: Updated or generated (CMD and Aladin plots)

  • data/df_UCC_C.csv: Updated plot_used column if modified

  • ucc/_clusters/*.md: Updated or generated

  • ucc/assets/members/*.csv.gz: Updated files

  • ucc/assets/clusters.csv.gz: Updated

  • ucc/assets/clusters-manifest.json: Updated

  • ucc/images/*.webp: Updated

  • ucc/_pages/*.md: Updated

  • ucc/_tables/*_table.md: Updated

5. Building the site

The Jekyll theme used by the site is a modified Reverie theme.

5.1 Local build

Before updating the live site, generate a local site build and check the results carefully. To build a local copy of the site we use Jekyll, see Jekyll docs.

If this is a new installation, update the gems with:

$ bundle update --all

To build a local version of the site, position a terminal in the /ucc folder (not the /updt_ucc folder) and run:

$ bundle exec jekyll serve --incremental

This will generate a full version of the site locally which can take a while. For a faster build, avoid processing the files in the _clusters, _tables folder (for example, using a different include with fewer/different selected folders)

The script test_build.sh can also be used to check that the local build. It will select by default 10 random clusters from the _cluster/ folder and generate the site:

$ ./test_build.sh

To select a specific cluster (eg, melotte55) instead of random clusters, run:

$ ./test_build.sh 0 melotte55

You can also select the number of random clusters to be generated and/or exclude some clusters by their name. To generate N cluster pages while excluding clusters with names starting with 'cwnu', 'cwwdl', 'ckcwdm', 'hsc' or 'theia' you can run:

$ ./test_build.sh N -cwnu -cwwdl -ckcwdm -hsc -theia

Check the local version in both Chrome and Firefox.

5.2 Live build

  1. Create a 'New version' in the Zenodo repository

-a Make sure that the version number in the zenodo/README.txt file matches that in the _pages/CHANGELOG.md file -b Upload the three files stored in the zenodo/ folder: README.md, UCC_cat.csv, UCC_members.parquet -c Get a new DOI from Zenodo -d Add a Publication date with the format: YYYY-MM-DD -e Use the same version number from the README (format: YYMMDD) in the release

Publish this new release and copy its own url (not the general repository url)

  1. Update the _pages/CHANGELOG.md file, use the Zenodo URL for this release

  2. Push changes (if any) to each of the plots/plots_* repositories. To do this, run:

$ for dir in ../plots/plots_*/; do (cd "$dir" && [ -d .git ] && git acp "updt plots"); done
  1. IMPORTANT: Make sure that the _config.yml file includes all the folders

  2. Push any remaining changes to the ucc repository

  3. Deploy the site using the Github workflow Deploy Jekyll site to Pages

  4. Test live site both in Chrome and Firefox

Libraries and services used:

About

Scripts to add a new DB to the UCC catalogue

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages