Skip to content

Latest commit

 

History

History
87 lines (54 loc) · 4.68 KB

File metadata and controls

87 lines (54 loc) · 4.68 KB

Reproducibility (DKFZ internal only)

This document will guide you through the process of reproducing the main results for our sepsis ICU paper (Science Advances).

Setup

Start by installing the repository according to the README.

This work makes use of many different datasets. Your .env file should contain at least the following datasets:

export DKFZ_USERID=seidlits
export PATH_E130_Projekte=/mnt/E130-Projekte
export PATH_HTC_RESULTS=~/htc/results

# Datasets
export PATH_Tivita_sepsis_ICU=~/htc/2022_10_24_Tivita_sepsis_ICU

If you are already using this repository, it is recommend to clone it to a new folder and use a fresh conda environment. Existing results folder should not be available for the reproduction.

You also need access to the cluster, i.e. ssh $DKFZ_USERID@$WORKER_NODE should work (cf. our cluster documentation for more details).

Please use a screen environment for all of the following commands since they may take a while to complete.

Sanity check

The figures for this paper are created via the notebooks in paper/ScienceAdvances2025. Run the paper/ScienceAdvances2025/HSIvsTPIvsRGB.ipynb notebook now and you should get an error about missing paths.

Data specification files

The paper contains two main downstream tasks (sepsis diagnosis and mortality prediction) and employs a nested cross-validation scheme. To represent these experiments, we need a lot of data specification files.

(!) Note: Scikit learn version 1.8.0. has changed StratifiedGroupKFold, therefore the script run_sepsis_icu_datasets.py does not yield the same specs anymore. Now, take the already produces ones instead, lying at htc_projects/sepsis_icu/data

Metadata feature importance

We performed an experiment in which clinical data is subsequently combined with the HSI data in the order from most important to least important feature. The feature importance was determined using Recursive Feature Elimination (RFE) with a Random Forest Classifier trained on two different sets of clinical data: (1) Clinical data that is available at bedside within 1 hour (e.g. vital parameters, monitoring data, BGA) and (2) Clinical data that is only available within approximately 10 hours (e.g. adding lab results). To repeat this experiment, the rankings of the clinical data according to feature importance need to be recomputed by running the following command:

htc feature_importance_rankings

Training

We can now train our classification networks using HSI data and the combinations of HSI and clinical data. It is required that all runs are assigned the same timestamp to help identify them later on. Run the following command to start all 2790 runs:

htc multiple_sepsis_runs --timestamp "<YOUR_TIMESTAMP>"

Performing all runs locally takes approximately 2 days. If you had performed the computations using the cluster, copy the results from the cluster once all runs are complete:

htc move_results

Inference

The trained networks are stored in $PATH_HTC_RESULTS/training/image and $PATH_HTC_RESULTS/training/median_pixel and all run directories will start with the same timestamp (e.g., 2025-03-07_13-00-00). Use your new timestamp to run inference on the test datasets for all trained networks:

htc table_generation --notebook ""
htc multiple_test_tables --timestamp "<YOUR_TIMESTAMP>"

Furthermore, the aggregated result tables for the visualizations of the clinical data adding experiments need to be precomputed and stored, as (due to the large number of models involved) they are too heavy to be run within a notebook:

HTC_MODEL_TIMESTAMP="<YOUR_TIMESTAMP>" htc feature_adding_tables

Variables

All variables occuring in the paper can be generated by running paper/ScienceAdvances2025/run_generate_variables.py.

Figures

You now have all ingredients together to create the final figures. Run the following commands to produce the figures which utilize the trained networks

HTC_MODEL_TIMESTAMP="<YOUR_TIMESTAMP>" jupyter nbconvert --to html --execute --stdout paper/ScienceAdvances2025/HSIvsTPIvsRGB.ipynb > /dev/null
HTC_MODEL_TIMESTAMP="<YOUR_TIMESTAMP>" jupyter nbconvert --to html --execute --stdout paper/ScienceAdvances2025/FeatureAdding.ipynb > /dev/null
HTC_MODEL_TIMESTAMP="<YOUR_TIMESTAMP>" jupyter nbconvert --to html --execute --stdout paper/ScienceAdvances2025/HSIvsClinicalScores.ipynb > /dev/null

You will find the resulting figures in $PATH_HTC_RESULTS/paper. You can run the other notebooks in the same way.