This repository contains code and documentation to reproduce the pipeline described in our paper for predicting progression from MCI to Alzheimer's disease using blood-based multi-omics data.
Because the original ADNI datasets are sensitive and cannot be redistributed, we include a toy dataset (~/toy_complete_data.csv) that matches the expected schema used by the provided code.
| Modality | Script |
|---|---|
| Lipids | ~/feature_selection_lipid.py |
| Gene + CpG | ~/feature_selection_gene_cpg.py |
| SNPs | ~/gwas_pipeline.sh |
No features were selected for bile data modality.
All the features finalized have been listed in ~/ML_models_feature_list.csv
Binary classification:
| Class | Meaning |
|---|---|
| 0 | Stable MCI (sMCI) |
| 1 | Progressive MCI (pMCI) |
Toy dataset uses:
- sMCI = 1
- pMCI = 2
All scripts internally map {1,2} → {0,1}.
Each modality is represented by a feature suffix:
| Modality | Suffix |
|---|---|
| SNPs | _snp |
| DNA methylation | _dm |
| Gene expression | _gene |
| Lipids | _lip |
| Bile acids | _bile |
| Demographics | no suffix |
Script:
~/Early_Late_integration.py
All omics features are concatenated into a single matrix.
Each modality produces a hard 0/1 prediction and the final decision is the majority vote across modalities.
- L1 Logistic Regression
- SVM
- Random Forest
- XGBoost
- Neural Network
All are evaluated in both early and late integration modes.
Scripts:
~/run_shap_lime_10x.py
SHAP and LIME are used to explain both early and late integration models.
10 independent runs are performed and only biomarkers appearing in all runs are retained.
The toy dataset:
~/toy_complete_data.csv
matches the ADNI schema used in the paper.
Replacing it with the real ADNI matrices reproduces the published results.
Accessing ADNI Resources:
There are two separate application processes depending on the type of ADNI resources you are requesting:
- Data (e.g., imaging, genomic, clinical data, etc.)
- Biospecimen samples (e.g., blood plasma, DNA, brain tissue, etc.)
Note: All ADNI data and biospecimens are de-identified. To learn more about how ADNI data are collected and processed, please review the official documentation: https://adni.loni.usc.edu/help-faqs/adni-documentation/
To request access to ADNI data:
- Review and agree to the ADNI Data Use Agreement (DUA).
- Submit your application through the ADNI Data portal: https://adni.loni.usc.edu/data-samples/adni-data/#AccessData
To request access to ADNI biospecimens:
- Review the relevant biospecimen documentation.
- Submit your application through the ADNI Samples portal: https://adni.loni.usc.edu/data-samples/adni-samples/#ApplyForAccessToSamples