UCC management

The UCC lives in several repositories within the UCC23 organization and in its associated Zenodo repository.

The updt_ucc repository, this one, contains the scripts and data files required to update the UCC (mostly) automatically. It also contains the main files that make up the UCC: UCC_cat_B.csv, UCC_cat_C.csv, and the UCC_members.parquet file with all the identified members (stored in the zenodo/ folder).
The ucc repository contains for each entry in the database a corresponding entry in the form of an .md file, in the _clusters/ folder. This repository also contains all the files required to build the public site
The plots_X repositories contain several plots per OC in various folders: one with the four diagrams, one for Aladin, and plots for the HUNT23 and/or CANTAT20 members when available. The plots are loaded by the public site from this repository.

The UCC update process is managed through five main scripts that perform the required tasks in sequence, as described in the following sections.

1. Adding a new DB

Given a new database (DB) to be added to the UCC, the A_get_new_DB.py script manages the updating of the JSON file that contains the information for each database in the UCC, as well as downloading the Vizier database if available and/or requested:

Load the current JSON database file.
Check if the URL is already listed in the current database.
Fetch publication authors and year from NASA/ADS
Generate a new database name based on extracted metadata.
Handle temporary database files and check for existing data.
Fetch Vizier data or allow manual input for Vizier IDs.
Match new database columns with current JSON structure.
Update the JSON file and save the database as CSV.

Once the JSON file with the entry for the new DB and the generated CSV file with the DB data are stored in the temporary folder, check carefully both files before moving on as both might need manual intervention.

Input

ADS_bibcode: NASA/ADS bibcode for the new DB
data/databases_info.json: Current JSON file with info about the databases included in the UCC

Output

temp_updt/data/databases_info.json: Updated JSON file with new DB entry
temp_updt/data/NEW_DB.csv: New database in CSV format

Important: The column with the names of the new entries can contain several names; these must be separated using a ','.

Remember to carefully check the keys for the new DB in the JSON file. They are auto detected by the script and they might need to be corrected and/or filled.

The output files will be loaded and updated by the B script, they must not be manually moved.

JSON file format

The format of the databases_info.json file is mostly self-explanatory with the exception of the parameters section. This can have different formats depending on the database structure. There are three possible formats:

Default

The default formatting corresponds to a single column with a given formatting/units for each parameter:

"pars": {
  "par_general": {
    "par_format": "par_db_col"
  },
  ...
}

For some databases the "par_db_col" can contain multiple values separated by a ',' or ';'. Eg: KRONBERGER2006, ROSER2016, NIZOVKINA2025

Duplicate values, different units

The "par_general" parameter is present in more tan one column with different units (eg, obtained by different methods). Eg: JAEHNIG2021, CARRASCO2025

"pars": {
  "par_general": {
    "par_format_0": "par_db_col_0",
    "par_format_1": "par_db_col_1",
  },
  ...
}

Duplicate units

There is more than one column with the same "par_format" units for the same parameter. Eg: CHEN2003, PISKUNOV2008, SANTOS2021, HUNT2024, ZHANG2024, HU2025

"pars": {
  "par_general": {
    "par_format": ["par_db_col_0", "par_db_col_1"],
  },
  ...
}

2. Updating the UCC: first step

The B_update_UCC script handles two possible cases:

Adding a new database to the UCC
Rebuilding the entire UCC

The case is decided by the script according to whether it finds a temporary databases_info.json file with new DBs to add. The final UCC_cat_B.csv file contains the following main columns:

"fnames","DB","DB_i","Names"

which are the main identifiers for each entry in the UCC. Along with these columns, the file also contains the following columns which are used only as centers to estimate membership by the next script:

"RA_ICRS","DE_ICRS","GLON","GLAT","Plx","pmRA","pmDE"

and finally the fund_pars column which contains the fundamental parameters for each entry extracted from each DB where it is present.

Input

temp_updt/data/databases_info.json: New JSON file (generated by script A)
temp_updt/data/NEW_DB.json: New database (generated by script A)
data/UCC_cat_B.csv: Current version of this file
data/databases_info.json: Current JSON database file
data/globulars.csv: Globular clusters data file

Output

UCC_cat_B.csv: Updated version of this file (old one is archived)

If new DBs were added then these files are moved from temporary folders to their final destination:

data/databases_info.json: Updated UCC database JSON file
data/NEW_DB.csv: New database in CSV format

3. Updating the UCC: second step

The C_process_member_files.py script generates the UCC_cat_C.csv file and updates the parquet file with all the identified members. It also generates the required files for uploading to the Zenodo repository (these files need to be manually uploaded to a new Zenodo release).

The UCC_cat_C.csv file contains columns that represent the following information:

- fnames: names of each entry (must match those in the B file)
- plots_used: (y/n) tells the E script if an entry plot needs to be generated/updated
- process: (y/n) manual flag that indicates whether to (re)process this entry
- N_clust: Fixed number of cluster members (has precedence over N_clust_max)
- N_clust_max: Maximum number of cluster members
- N_box: Multiplier value for the box size used to search for members
- frame_limit: A string in the format "x_111.1" where "x" is one of the characters:

  - 'b' (bottom), 't' (top), 'l' (left), 'r' (right) for (GLON, GLAT) coordinates
  - 'pmb' (bottom), 'pmt' (top), 'pml' (left), 'pmr' (right) for (pmra, pmde)
  - 'plxl' (left), 'plxr' (right) for parallax

and the numbers are the limiting values for each. A single limit can be provided (eg,
"x_111.1") or several separated by a ',' (eg, "x_111.1,y_222.2,...").

The rest of the columns are generated using information taken from the estimated members for each entry. This script perform the following main operations:

3.1. Change detection

Compares two versions of the UCC catalog: a source catalog (UCC_cat_B.csv) and the current catalog (UCC_cat_C.csv). It identifies three sets of open clusters:

New clusters to be added (B entry not in C --> Add to C)
Old clusters to be removed (C entry not in B --> Remove from C)
Existing clusters that have been manually marked for reprocessing (C entry with 'process=y' --> Reprocess in C)

3.2 Cluster processing

For each new or re-processed cluster, the script performs a detailed analysis:

Uses the fastMP method to identify likely member stars
Saves the list of member stars for each cluster into an individual .parquet file
Calculates various parameters for the cluster based on its members

3.3 Data file updating

After processing all the necessary clusters, the script updates the main UCC data:

Combines all the individual member .parquet files into a single, updated master members file
Updates the UCC_cat_C.csv file with the newly derived data, adding new clusters and removing old ones

3.4 Post-processing

Finds Shared Members: analyzes the entire member list to find stars that belong to more than one cluster and records this information
Calculates the UTI

Input

data/df_UCC_B_updt.csv: Current UCC database (produced by the B script)
data/df_UCC_C_updt.csv: Current version of this file
data/manual_centers.csv: Manual centers for selected OCs
databases/globulars.csv: Globular clusters data file
data/databases_info.json: Current UCC database JSON file
Gaia data files: Gaia data files for a given release
zenodo/README.txt: Zenodo README file
zenodo/UCC_members.parquet: File with estimated member for all the clusters (FILE NOT TRACKED BECAUSE IT IS TOO LARGE)

Output

data/df_UCC_C_updt.csv: Updated version (previous version is archived)
zenodo/README.txt: Updated version
zenodo/UCC_cat.csv: Updated (and simplified) version of the UCC catalog
zenodo/UCC_members.parquet: Updated version

The last three files are uploaded to Zenodo to generate a new release.

4. Updating the site

Updating the site requires running the D_update_UCC_site.py script.

This script applies the required changes to update the ucc.ar site. It processes the UCC catalogue and searches for modifications that need to be applied to update the site.

Generate/update per cluster .webp files (stored in the plots/ folders)
If plots were generated/updated, update the plot_used column in the data/df_UCC_C.csv file
Generate/update per cluster .md (stored in ucc/_clusters/) files
Update the split members files (ucc/assets/members/*.csv.gz)
Update the CSV clusters file ucc/assets/clusters.csv.gz and its associated JSON file clusters-manifest.json
Update the main UCC site files including tables and images
Move all files to their final destination
Check that the number of files is correct

Input

data/df_UCC_B.csv: Used to access certain columns like Names and DB
data/df_UCC_C.csv: Primary source of data for most of the website content
data/zenodo/UCC_members.parquet: Used for generating cluster plots
data/databases_info.json: Latest UCC database JSON file
data/databases/cmmts/*.csv: CSV files with comments
ucc/assets/clusters.csv.gz: JSON file with the latest UCC data
ucc/_pages/XXXXX.md (DATABASES, ARTICLES, TABLES)
ucc/_tables/XXXXX_table.md (individual table pages)
ucc/_tables/dbs/{db_name}.csv (one file for each original database in the UCC)
ucc/_clusters/{cluster_name}.md (individual cluster pages)

Output

UCC/plots/plots_X/*/*.webp: Updated or generated (CMD and Aladin plots)
data/df_UCC_C.csv: Updated plot_used column if modified
ucc/_clusters/*.md: Updated or generated
ucc/assets/members/*.csv.gz: Updated files
ucc/assets/clusters.csv.gz: Updated
ucc/assets/clusters-manifest.json: Updated
ucc/images/*.webp: Updated
ucc/_pages/*.md: Updated
ucc/_tables/*_table.md: Updated

5. Building the site

The Jekyll theme used by the site is a modified Reverie theme.

5.1 Local build

Before updating the live site, generate a local site build and check the results carefully. To build a local copy of the site we use Jekyll, see Jekyll docs.

If this is a new installation, update the gems with:

$ bundle update --all

To build a local version of the site, position a terminal in the /ucc folder (not the /updt_ucc folder) and run:

$ bundle exec jekyll serve --incremental

This will generate a full version of the site locally which can take a while. For a faster build, avoid processing the files in the _clusters, _tables folder (for example, using a different include with fewer/different selected folders)

The script test_build.sh can also be used to check that the local build. It will select by default 10 random clusters from the _cluster/ folder and generate the site:

$ ./test_build.sh

To select a specific cluster (eg, melotte55) instead of random clusters, run:

$ ./test_build.sh 0 melotte55

You can also select the number of random clusters to be generated and/or exclude some clusters by their name. To generate N cluster pages while excluding clusters with names starting with 'cwnu', 'cwwdl', 'ckcwdm', 'hsc' or 'theia' you can run:

$ ./test_build.sh N -cwnu -cwwdl -ckcwdm -hsc -theia

Check the local version in both Chrome and Firefox.

5.2 Live build

Create a 'New version' in the Zenodo repository

-a Make sure that the version number in the zenodo/README.txt file matches that in the _pages/CHANGELOG.md file -b Upload the three files stored in the zenodo/ folder: README.md, UCC_cat.csv, UCC_members.parquet -c Get a new DOI from Zenodo -d Add a Publication date with the format: YYYY-MM-DD -e Use the same version number from the README (format: YYMMDD) in the release

Publish this new release and copy its own url (not the general repository url)

Update the _pages/CHANGELOG.md file, use the Zenodo URL for this release
Push changes (if any) to each of the plots/plots_* repositories. To do this, run:

$ for dir in ../plots/plots_*/; do (cd "$dir" && [ -d .git ] && git acp "updt plots"); done

IMPORTANT: Make sure that the _config.yml file includes all the folders
Push any remaining changes to the ucc repository
Deploy the site using the Github workflow Deploy Jekyll site to Pages
Test live site both in Chrome and Firefox

Libraries and services used:

pako (https://github.com/nodeca/pako): loadCSV.js
d3 (https://github.com/d3/d3): map_search.js
d3-geo-projection (https://github.com/d3/d3-geo-projection): map_search.js
Plotly (https://github.com/plotly/plotly.py): posts.html
flatgithub.com: ARTICLES.md

Name		Name	Last commit message	Last commit date
Latest commit History 1,475 Commits
data		data
helper_scripts		helper_scripts
modules		modules
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCC management

1. Adding a new DB

Input

Output

JSON file format

2. Updating the UCC: first step

Input

Output

3. Updating the UCC: second step

3.1. Change detection

3.2 Cluster processing

3.3 Data file updating

3.4 Post-processing

Input

Output

4. Updating the site

Input

Output

5. Building the site

5.1 Local build

5.2 Live build

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UCC management

1. Adding a new DB

Input

Output

JSON file format

2. Updating the UCC: first step

Input

Output

3. Updating the UCC: second step

3.1. Change detection

3.2 Cluster processing

3.3 Data file updating

3.4 Post-processing

Input

Output

4. Updating the site

Input

Output

5. Building the site

5.1 Local build

5.2 Live build

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages