Skip to content

bozdaglab/AD_sMCI_vs_pMCI

Repository files navigation

Multi-omics prediction of pMCI vs sMCI (Reproducibility Repo)

This repository contains code and documentation to reproduce the pipeline described in our paper for predicting progression from MCI to Alzheimer's disease using blood-based multi-omics data.

Because the original ADNI datasets are sensitive and cannot be redistributed, we include a toy dataset (~/toy_complete_data.csv) that matches the expected schema used by the provided code.


1. Feature Selection

Modality Script
Lipids ~/feature_selection_lipid.py
Gene + CpG ~/feature_selection_gene_cpg.py
SNPs ~/gwas_pipeline.sh

No features were selected for bile data modality.

All the features finalized have been listed in ~/ML_models_feature_list.csv


2. Prediction task

Binary classification:

Class Meaning
0 Stable MCI (sMCI)
1 Progressive MCI (pMCI)

Toy dataset uses:

  • sMCI = 1
  • pMCI = 2

All scripts internally map {1,2} → {0,1}.


3. Modalities used

Each modality is represented by a feature suffix:

Modality Suffix
SNPs _snp
DNA methylation _dm
Gene expression _gene
Lipids _lip
Bile acids _bile
Demographics no suffix

4. Early vs Late Integration (core of the paper)

Script:

~/Early_Late_integration.py

3.1 Early Integration (EI)

All omics features are concatenated into a single matrix.

3.2 Late Integration (LI)

Each modality produces a hard 0/1 prediction and the final decision is the majority vote across modalities.


5. Models evaluated

  • L1 Logistic Regression
  • SVM
  • Random Forest
  • XGBoost
  • Neural Network

All are evaluated in both early and late integration modes.


6. Interpretability

Scripts:

~/run_shap_lime_10x.py

SHAP and LIME are used to explain both early and late integration models.


7. Stability analysis

10 independent runs are performed and only biomarkers appearing in all runs are retained.


8. Toy data

The toy dataset:

~/toy_complete_data.csv

matches the ADNI schema used in the paper.

Replacing it with the real ADNI matrices reproduces the published results.


9. ADNI data access

Accessing ADNI Resources:

There are two separate application processes depending on the type of ADNI resources you are requesting:

  • Data (e.g., imaging, genomic, clinical data, etc.)
  • Biospecimen samples (e.g., blood plasma, DNA, brain tissue, etc.)

Note: All ADNI data and biospecimens are de-identified. To learn more about how ADNI data are collected and processed, please review the official documentation: https://adni.loni.usc.edu/help-faqs/adni-documentation/


To request access to ADNI data:


To request access to ADNI biospecimens:

About

This repository discusses the analysis and pre-processing details of stable MCI (sMCI) vs. progressive MCI (pMCI) classification.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors