Liquid chromatography–mass spectrometry (LC‑MS) enables metabolite identification using molecule mass, mass entropy, and retention time (RT). However, RT varies significantly across LC–MS setups and structural isomers often share near-identical masses, making metabolite annotation difficult and error‑prone.
Recent deep‑learning models, like Graphormer‑RT , can predict retention times dependent on specific LC–MS configurations, offering a path to improved annotation workflows (repository).
This repository provides a full pipeline for applying modern deep learning to LC–MS Hydrophilic Interaction Liquid Chromatography (HILIC) and Reverse Phase (RP) data to improve annotation.
- Background
- Quick Links
- Features
- Manuals (HowTo)
- Warnings
- Common Issues
- References
- Contact
- Contribute
- License
- Graphormer-RT For Deep Learning:
- Zenodo For Model Weights:
- Graphormer-RT or "OG" Weights: https://zenodo.org/records/15021743
- Our or "DM" Weights: https://zenodo.org/records/18867980
- Docker: https://hub.docker.com/r/dnhem/proj_deepmetab
- RepoRT: https://github.com/michaelwitting/RepoRT
| Feature Category | Description |
|---|---|
| LC–MS Preprocessing | Tools to clean, format, and structure LC–MS datasets for prediction |
| Data Loaders | Scripts to load, featurize, and register data |
| Workflow Setup | Setup directory, container, etc. for RP/HILIC/Both RT prediction and annotation |
| Model Training | Train RP Graphormer‑RT from scratch (Not Recomended) |
| Model Finetuning | Finetune Graphormer‑RT models (HILIC transfer learning and HILIC finetuning) |
| RT Prediction | Generate RT predictions and integrate results back into LC–MS feature tables |
| Annotation Scoring Framework | Score candidate molecules to resolve annotation ambiguities |
| Quality Control | Automatically flag potentially mis‑annotated mass feature IDs |
| Stereoisomer Flagging | Identify and label stereoisomer mass feature IDs |
| LazyPredict ✨ | Run entire automated workflow |
Choose the manual based on your preferences:
| Environment | Workflow Type | Documentation | Status |
|---|---|---|---|
| HPC / Cloud | Step-by-Step | Manual Guide | ✅ |
| HPC / Cloud | Automated | LazyPredict w/ Nextflow | ✅ |
| Local on Windows | Building Docker Image* | Docker Guide Installation | ✅ |
*Note: Building the docker image is not required for our workflow since users can easily pull it from dockerhub as listed under quick links. The Dockerfile is only provided to showcase the configurations required to construct the image.
Because non-HPC resources are limited, we do not provide support for running this tool outside SLURM-managed systems or without Apptainer (Singularity). If you wish to run this workflow in such environments (typically non-cloud or non-HPC), please refer to our Script Adaptation Guidelines.
DO NOT MOVE
.sifFILES
The following directories are bind‑mounted and depend on fixed paths and filenames. Take extreme caution editing.
workspace/Graphormer-RT/checkpoints_RP/
graphormer_checkpoints_RP/
workspace/Graphormer-RT/checkpoints_HILIC/
graphormer_checkpoints_HILIC/
my_data/HILIC_ft/
workspace/Graphormer-RT/my_data/HILIC_ft/
-
Fairseq Errors: Please ensure that you are using the Docker provided. The original Graphormer was built on a Snapshot Release of fairseq. This version is not available via conda/mamba. If error persists, please ensure that container version installation matches the most recent available on Docker.
-
Python Package Issues: Please first ensure that you are using the Docker materials provided.
-
Obviously Incorrect Predictions: This workflow does not verify that HILIC data is used with the HILIC model (or RP data with the RP model). As a result, it is easy to accidentally input HILIC data into the RP model, which produces extremely inaccurate predictions.
-
Apptainer/SLURM Issues: Please run the commands
apptainer --versionandscancelto ensure that these tools are available. If they are not, reference this guide. -
If your issue is not shown above, please raise an issue.
Please access our references here
Special thanks to the Graphormer‑RT development team, our collaborators at PNNL, and to the BGMP instructional team for their mentorship. This work benefited from access to the University of Oregon HPC cluster, Talapas.
If you have questions, feedback, or ideas, feel free to reach out to any of us:
- kcoulter [at] uoregon [dot] edu
- dnhem [at] uoregon [dot] edu
- ewi [at] uoregon [dot] edu
If you find this project helpful or interesting, please consider starring the repository ⭐
While active development has concluded, PRs are welcome! Feel free to contribute by opening a pull request with fixes, improvements, or new features.
This project is licensed under the MIT License — see the LICENSE file for more details.