Delfino is a simulation framework that uses Generative Pre-trained Transformers (GPTs) to model individual-level disease progression. It is designed to evaluate the long-term impact of clinical interventions (specifically GLP-1 receptor agonists) on obesity-related multi-morbidity.
The methodology of Delfino is an extension of the Delphi model contained in:
Shmatko, A. et al. (2024). Predicting future disease trajectories using transformers. Nature Medicine.
Read the full paper here
Key Contributors to Delfino :
- Dr. Daniel Laydon (Lead Developer)
- Prof. Timothy Hallett
- Dr. Shevanthi Nayagam
- Prof. Alex Bottle
- Sequential States: Delfino treats a patient's life as a sequence of tokens. This allows the model to retain long-term dependencies in a patient's medical history.
- Tokenization: Clinical events, demographic markers (Sex, Age), and risk factors (BMI, Smoking) are mapped to a discrete vocabulary.
- Inference: The model predicts the "next likely clinical event" based on the entire preceding history, enabling the simulation of complex co-morbidity patterns.
- Training Data: Architecture is designed for training on longitudinal cohorts like the UK Biobank and Whole Systems Integrated Care (WSIC).
- Clinical Coding: Model outputs are mapped to the ICD-10 system (Chapters A00–Q99). This ensures that synthetic trajectories are expressed in standard clinical nomenclature (e.g., C-codes for oncology, I-codes for cardiovascular).
- Explainable AI (XAI): Uses SHAP (SHapley Additive exPlanations) values to calculate the contribution of historical tokens to specific future risk predictions, providing transparency for clinical validation.
The engine calculates two primary health metrics:
-
DALYs (Disability-Adjusted Life Years):
- YLD: Accrued annually based on IHME/GBD disability weights mapped to ICD-10 tokens.
- YLL: Calculated upon a "Death" token relative to actuarial life expectancy.
-
QALYs (Quality-Adjusted Life Years):
-
Additive: Calculates quality decrements (
$1 - utility$ ). -
Multiplicative: Calculates composite utility (
$U_{total} = \prod U_n$ ), reflecting standard HTA methodology for multi-morbid states.
-
Additive: Calculates quality decrements (
- Language: Python/PyTorch.
- Hardware: Optimized for NVIDIA GPU architectures (CUDA).
- Parallelization: Uses a multi-process orchestrator to manage parallel worker streams, bypassing the Python GIL to achieve high throughput (measured in patients/second).
-
Efficiency: Disease parameters (weights, costs, utilities) are vectorized into NumPy arrays for
$O(1)$ lookup during the simulation loop.
- Preprocessing:
create_dummy_disease_data.pymaps labels to clinical weights. - Simulation:
delfino.pyruns stochastic inference for Baseline and Intervention groups. - Orchestration:
run_experiment.pyhandles data slicing and subprocess management. - Analysis:
compare_results.pyandplot_results.pygenerate incidence curves and cost-effectiveness metrics.