The UCC lives in several repositories within the UCC23 organization and in its associated Zenodo repository.
-
The updt_ucc repository, this one, contains the scripts and data files required to update the UCC (mostly) automatically. It also contains the main files that make up the UCC:
UCC_cat_B.csv,UCC_cat_C.csv, and theUCC_members.parquetfile with all the identified members (stored in thezenodo/folder). -
The ucc repository contains for each entry in the database a corresponding entry in the form of an
.mdfile, in the_clusters/folder. This repository also contains all the files required to build the public site -
The
plots_Xrepositories contain several plots per OC in various folders: one with the four diagrams, one for Aladin, and plots for the HUNT23 and/or CANTAT20 members when available. The plots are loaded by the public site from this repository.
The UCC update process is managed through five main scripts that perform the required tasks in sequence, as described in the following sections.
Given a new database (DB) to be added to the UCC, the A_get_new_DB.py script
manages the updating of the JSON file that contains the information for each database
in the UCC, as well as downloading the Vizier database if available and/or requested:
- Load the current JSON database file.
- Check if the URL is already listed in the current database.
- Fetch publication authors and year from NASA/ADS
- Generate a new database name based on extracted metadata.
- Handle temporary database files and check for existing data.
- Fetch Vizier data or allow manual input for Vizier IDs.
- Match new database columns with current JSON structure.
- Update the JSON file and save the database as CSV.
Once the JSON file with the entry for the new DB and the generated CSV file with the DB data are stored in the temporary folder, check carefully both files before moving on as both might need manual intervention.
ADS_bibcode: NASA/ADS bibcode for the new DBdata/databases_info.json: Current JSON file with info about the databases included in the UCC
temp_updt/data/databases_info.json: Updated JSON file with new DB entrytemp_updt/data/NEW_DB.csv: New database in CSV format
Important: The column with the names of the new entries can contain several names; these must be separated using a ','.
Remember to carefully check the keys for the new DB in the JSON file. They are auto detected by the script and they might need to be corrected and/or filled.
The output files will be loaded and updated by the B script, they must not be manually moved.
The format of the databases_info.json file is mostly self-explanatory with the
exception of the parameters section. This can have different formats depending on the
database structure. There are three possible formats:
- Default
The default formatting corresponds to a single column with a given formatting/units for each parameter:
"pars": {
"par_general": {
"par_format": "par_db_col"
},
...
}
For some databases the "par_db_col" can contain multiple values separated by a ',' or ';'. Eg: KRONBERGER2006, ROSER2016, NIZOVKINA2025
- Duplicate values, different units
The "par_general" parameter is present in more tan one column with different units (eg, obtained by different methods). Eg: JAEHNIG2021, CARRASCO2025
"pars": {
"par_general": {
"par_format_0": "par_db_col_0",
"par_format_1": "par_db_col_1",
},
...
}
- Duplicate units
There is more than one column with the same "par_format" units for the same parameter. Eg: CHEN2003, PISKUNOV2008, SANTOS2021, HUNT2024, ZHANG2024, HU2025
"pars": {
"par_general": {
"par_format": ["par_db_col_0", "par_db_col_1"],
},
...
}
The B_update_UCC script handles two possible cases:
- Adding a new database to the UCC
- Rebuilding the entire UCC
The case is decided by the script according to whether it finds a temporary
databases_info.json file with new DBs to add. The final UCC_cat_B.csv file
contains the following main columns:
"fnames","DB","DB_i","Names"
which are the main identifiers for each entry in the UCC. Along with these columns, the file also contains the following columns which are used only as centers to estimate membership by the next script:
"RA_ICRS","DE_ICRS","GLON","GLAT","Plx","pmRA","pmDE"
and finally the fund_pars column which contains the fundamental parameters for
each entry extracted from each DB where it is present.
temp_updt/data/databases_info.json: New JSON file (generated by script A)temp_updt/data/NEW_DB.json: New database (generated by script A)data/UCC_cat_B.csv: Current version of this filedata/databases_info.json: Current JSON database filedata/globulars.csv: Globular clusters data file
UCC_cat_B.csv: Updated version of this file (old one is archived)
If new DBs were added then these files are moved from temporary folders to their final destination:
data/databases_info.json: Updated UCC database JSON filedata/NEW_DB.csv: New database in CSV format
The C_process_member_files.py script generates the UCC_cat_C.csv file and updates
the parquet file with all the identified members. It also generates the required files
for uploading to the Zenodo repository (these files need to be manually uploaded to a
new Zenodo release).
The UCC_cat_C.csv file contains columns that represent the following information:
- fnames: names of each entry (must match those in the B file)
- plots_used: (y/n) tells the E script if an entry plot needs to be generated/updated
- process: (y/n) manual flag that indicates whether to (re)process this entry
- N_clust: Fixed number of cluster members (has precedence over N_clust_max)
- N_clust_max: Maximum number of cluster members
- N_box: Multiplier value for the box size used to search for members
- frame_limit: A string in the format "x_111.1" where "x" is one of the characters:
- 'b' (bottom), 't' (top), 'l' (left), 'r' (right) for (GLON, GLAT) coordinates
- 'pmb' (bottom), 'pmt' (top), 'pml' (left), 'pmr' (right) for (pmra, pmde)
- 'plxl' (left), 'plxr' (right) for parallax
and the numbers are the limiting values for each. A single limit can be provided (eg,
"x_111.1") or several separated by a ',' (eg, "x_111.1,y_222.2,...").
The rest of the columns are generated using information taken from the estimated members for each entry. This script perform the following main operations:
Compares two versions of the UCC catalog: a source catalog (UCC_cat_B.csv) and the
current catalog (UCC_cat_C.csv). It identifies three sets of open clusters:
- New clusters to be added (B entry not in C --> Add to C)
- Old clusters to be removed (C entry not in B --> Remove from C)
- Existing clusters that have been manually marked for reprocessing (C entry with 'process=y' --> Reprocess in C)
For each new or re-processed cluster, the script performs a detailed analysis:
- Uses the
fastMPmethod to identify likely member stars - Saves the list of member stars for each cluster into an individual .parquet file
- Calculates various parameters for the cluster based on its members
After processing all the necessary clusters, the script updates the main UCC data:
- Combines all the individual member .parquet files into a single, updated master members file
- Updates the
UCC_cat_C.csvfile with the newly derived data, adding new clusters and removing old ones
- Finds Shared Members: analyzes the entire member list to find stars that belong to more than one cluster and records this information
- Calculates the UTI
data/df_UCC_B_updt.csv: Current UCC database (produced by the B script)data/df_UCC_C_updt.csv: Current version of this filedata/manual_centers.csv: Manual centers for selected OCsdatabases/globulars.csv: Globular clusters data filedata/databases_info.json: Current UCC database JSON fileGaia data files: Gaia data files for a given releasezenodo/README.txt: Zenodo README filezenodo/UCC_members.parquet: File with estimated member for all the clusters (FILE NOT TRACKED BECAUSE IT IS TOO LARGE)
data/df_UCC_C_updt.csv: Updated version (previous version is archived)zenodo/README.txt: Updated versionzenodo/UCC_cat.csv: Updated (and simplified) version of the UCC catalogzenodo/UCC_members.parquet: Updated version
The last three files are uploaded to Zenodo to generate a new release.
Updating the site requires running the D_update_UCC_site.py script.
This script applies the required changes to update the ucc.ar site. It processes the UCC catalogue and searches for modifications that need to be applied to update the site.
- Generate/update per cluster
.webpfiles (stored in theplots/folders) - If plots were generated/updated, update the
plot_usedcolumn in thedata/df_UCC_C.csvfile - Generate/update per cluster
.md(stored inucc/_clusters/) files - Update the split members files (
ucc/assets/members/*.csv.gz) - Update the CSV clusters file
ucc/assets/clusters.csv.gzand its associated JSON fileclusters-manifest.json - Update the main UCC site files including tables and images
- Move all files to their final destination
- Check that the number of files is correct
-
data/df_UCC_B.csv: Used to access certain columns like Names and DB -
data/df_UCC_C.csv: Primary source of data for most of the website content -
data/zenodo/UCC_members.parquet: Used for generating cluster plots -
data/databases_info.json: Latest UCC database JSON file -
data/databases/cmmts/*.csv: CSV files with comments -
ucc/assets/clusters.csv.gz: JSON file with the latest UCC data -
ucc/_pages/XXXXX.md(DATABASES, ARTICLES, TABLES) -
ucc/_tables/XXXXX_table.md(individual table pages) -
ucc/_tables/dbs/{db_name}.csv(one file for each original database in the UCC) -
ucc/_clusters/{cluster_name}.md(individual cluster pages)
-
UCC/plots/plots_X/*/*.webp: Updated or generated (CMD and Aladin plots) -
data/df_UCC_C.csv: Updatedplot_usedcolumn if modified -
ucc/_clusters/*.md: Updated or generated -
ucc/assets/members/*.csv.gz: Updated files -
ucc/assets/clusters.csv.gz: Updated -
ucc/assets/clusters-manifest.json: Updated -
ucc/images/*.webp: Updated -
ucc/_pages/*.md: Updated -
ucc/_tables/*_table.md: Updated
The Jekyll theme used by the site is a modified Reverie theme.
Before updating the live site, generate a local site build and check the results carefully. To build a local copy of the site we use Jekyll, see Jekyll docs.
If this is a new installation, update the gems with:
$ bundle update --all
To build a local version of the site, position a terminal in the /ucc folder
(not the /updt_ucc folder) and run:
$ bundle exec jekyll serve --incremental
This will generate a full version of the site locally which can take a while. For a
faster build, avoid processing the files in the _clusters, _tables folder (for
example, using a different include with fewer/different selected folders)
The script test_build.sh can also be used to check that the local build. It will
select by default 10 random clusters from the _cluster/ folder and generate the
site:
$ ./test_build.sh
To select a specific cluster (eg, melotte55) instead of random clusters, run:
$ ./test_build.sh 0 melotte55
You can also select the number of random clusters to be generated and/or exclude some
clusters by their name. To generate N cluster pages while excluding clusters with
names starting with 'cwnu', 'cwwdl', 'ckcwdm', 'hsc' or 'theia' you can run:
$ ./test_build.sh N -cwnu -cwwdl -ckcwdm -hsc -theia
Check the local version in both Chrome and Firefox.
- Create a 'New version' in the Zenodo repository
-a Make sure that the version number in the zenodo/README.txt file matches that in
the _pages/CHANGELOG.md file
-b Upload the three files stored in the zenodo/ folder:
README.md, UCC_cat.csv, UCC_members.parquet
-c Get a new DOI from Zenodo
-d Add a Publication date with the format: YYYY-MM-DD
-e Use the same version number from the README (format: YYMMDD) in the release
Publish this new release and copy its own url (not the general repository url)
-
Update the
_pages/CHANGELOG.mdfile, use the Zenodo URL for this release -
Push changes (if any) to each of the
plots/plots_*repositories. To do this, run:
$ for dir in ../plots/plots_*/; do (cd "$dir" && [ -d .git ] && git acp "updt plots"); done
-
IMPORTANT: Make sure that the
_config.ymlfile includes all the folders -
Push any remaining changes to the
uccrepository -
Deploy the site using the Github workflow Deploy Jekyll site to Pages
-
Test live site both in Chrome and Firefox
Libraries and services used:
- pako (https://github.com/nodeca/pako): loadCSV.js
- d3 (https://github.com/d3/d3): map_search.js
- d3-geo-projection (https://github.com/d3/d3-geo-projection): map_search.js
- Plotly (https://github.com/plotly/plotly.py): posts.html
- flatgithub.com: ARTICLES.md