Multi-omics prediction of pMCI vs sMCI (Reproducibility Repo)

This repository contains code and documentation to reproduce the pipeline described in our paper for predicting progression from MCI to Alzheimer's disease using blood-based multi-omics data.

Because the original ADNI datasets are sensitive and cannot be redistributed, we include a toy dataset (~/toy_complete_data.csv) that matches the expected schema used by the provided code.

1. Feature Selection

Modality	Script
Lipids	`~/feature_selection_lipid.py`
Gene + CpG	`~/feature_selection_gene_cpg.py`
SNPs	`~/gwas_pipeline.sh`

No features were selected for bile data modality.

All the features finalized have been listed in ~/ML_models_feature_list.csv

2. Prediction task

Binary classification:

Class	Meaning
0	Stable MCI (sMCI)
1	Progressive MCI (pMCI)

Toy dataset uses:

sMCI = 1
pMCI = 2

All scripts internally map {1,2} → {0,1}.

3. Modalities used

Each modality is represented by a feature suffix:

Modality	Suffix
SNPs	`_snp`
DNA methylation	`_dm`
Gene expression	`_gene`
Lipids	`_lip`
Bile acids	`_bile`
Demographics	no suffix

4. Early vs Late Integration (core of the paper)

Script:

~/Early_Late_integration.py

3.1 Early Integration (EI)

All omics features are concatenated into a single matrix.

3.2 Late Integration (LI)

Each modality produces a hard 0/1 prediction and the final decision is the majority vote across modalities.

5. Models evaluated

L1 Logistic Regression
SVM
Random Forest
XGBoost
Neural Network

All are evaluated in both early and late integration modes.

6. Interpretability

Scripts:

~/run_shap_lime_10x.py

SHAP and LIME are used to explain both early and late integration models.

7. Stability analysis

10 independent runs are performed and only biomarkers appearing in all runs are retained.

8. Toy data

The toy dataset:

~/toy_complete_data.csv

matches the ADNI schema used in the paper.

Replacing it with the real ADNI matrices reproduces the published results.

9. ADNI data access

Accessing ADNI Resources:

There are two separate application processes depending on the type of ADNI resources you are requesting:

Data (e.g., imaging, genomic, clinical data, etc.)
Biospecimen samples (e.g., blood plasma, DNA, brain tissue, etc.)

Note: All ADNI data and biospecimens are de-identified. To learn more about how ADNI data are collected and processed, please review the official documentation: https://adni.loni.usc.edu/help-faqs/adni-documentation/

To request access to ADNI data:

Review and agree to the ADNI Data Use Agreement (DUA).
Submit your application through the ADNI Data portal: https://adni.loni.usc.edu/data-samples/adni-data/#AccessData

To request access to ADNI biospecimens:

Review the relevant biospecimen documentation.
Submit your application through the ADNI Samples portal: https://adni.loni.usc.edu/data-samples/adni-samples/#ApplyForAccessToSamples

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
Early_Late_integration.py		Early_Late_integration.py
LICENSE.txt		LICENSE.txt
ML_models_feature_list.csv		ML_models_feature_list.csv
README.md		README.md
data.md		data.md
feature_selection_gene_cpg.py		feature_selection_gene_cpg.py
feature_selection_lipid.py		feature_selection_lipid.py
feature_selection_type.py		feature_selection_type.py
gwas_pipeline.sh		gwas_pipeline.sh
interpretability.md		interpretability.md
methylation_minfi.R		methylation_minfi.R
run_shap_lime_10x.py		run_shap_lime_10x.py
toy_complete_data.csv		toy_complete_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-omics prediction of pMCI vs sMCI (Reproducibility Repo)

1. Feature Selection

2. Prediction task

3. Modalities used

4. Early vs Late Integration (core of the paper)

3.1 Early Integration (EI)

3.2 Late Integration (LI)

5. Models evaluated

6. Interpretability

7. Stability analysis

8. Toy data

9. ADNI data access

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-omics prediction of pMCI vs sMCI (Reproducibility Repo)

1. Feature Selection

2. Prediction task

3. Modalities used

4. Early vs Late Integration (core of the paper)

3.1 Early Integration (EI)

3.2 Late Integration (LI)

5. Models evaluated

6. Interpretability

7. Stability analysis

8. Toy data

9. ADNI data access

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages