Guney Tombak, Ertunc Erdil, Ender Konukoglu
Biomedical Image Computing Group, ETH Zurich
VoxCor is a training-free fit–transform method that produces reusable volumetric feature representations from frozen 2D ViT foundation models (DINOv2, DINOv3, MedSAM2, SAM3). A single offline fitting phase—using closed-form weighted partial least squares (WPLS) on a small set of paired volumes—yields modality-specific projection matrices that can be applied to new volumes by ViT inference and linear projection alone, without re-running registration. Voxel correspondences can then be queried by nearest-neighbor search.
This repository contains the public release code for our paper. The arXiv preprint will be linked here once available. Some paths and configuration files are designed to reproduce the paper experiments and may require adapting dataset locations to your local setup.
Fit phase (run once on a small paired training set):
- Triplanar frozen ViT inference (sagittal, coronal, axial slices).
- Per-axis joint-modality PCA compresses each axis to
$k$ channels. - Three axis features are concatenated into a
$3k$ -channel voxel volume. - Correspondence-aware WPLS projection is fitted by SVD of the weighted cross-covariance, producing modality-specific matrices.
Transform phase (applied to any new volume, no registration required):
- Triplanar ViT inference
- Stored PCA projection
- Stored WPLS projection
-
$k_{proj}$ -channel feature volume
PCA3D (cat_proj: pca3d) is a correspondence-free triplanar PCA control that replaces WPLS, allowing ablation of the correspondence-aware fitting step.
BandSlice is used as a lightweight six-parameter scale–translation global initializer, and Globally-Initialized ConvexAdam (GICA) denotes BandSlice followed by ConvexAdam elastic refinement. These components are used during fitting and registration evaluation to handle field-of-view misalignment between volume pairs.
The code was tested with Python 3.11.11.
git clone https://github.com/guneytombak/VoxCor.git
cd VoxCor
pip install -r requirements.txtThe pinned package versions correspond to the environment used for the paper experiments. Depending on your CUDA version, you may need to install PyTorch, torchvision, and xFormers separately following their official installation instructions.
VoxCor's ViT wrappers, located in src/model/vit/, load backbone code from cloned model repositories.
Create a models/ directory at the repository root and clone the relevant repositories there:
mkdir models
# DINOv2
git clone https://github.com/facebookresearch/dinov2.git models/dinov2
# DINOv3
git clone https://github.com/facebookresearch/dinov3.git models/dinov3
# MedSAM2
git clone https://github.com/bowang-lab/MedSAM2.git models/medsam2
# SAM3
git clone https://github.com/facebookresearch/sam3 models/sam3Pre-trained weights are loaded automatically by each wrapper on first use, or can be placed in the corresponding models/<name>/ directory as documented by the upstream repositories.
Pre-fitted VoxCor projection weights are provided as a separate release asset, weights.tar.gz, on the GitHub Releases page.
Download link: TODO
These weights can be used directly for transform-time feature extraction, provided that the corresponding encoder repository and pretrained backbone weights are installed correctly as described above.
After downloading weights.tar.gz from the release page, place it in the repository root and extract:
tar -xzf weights.tar.gzThis should create a weights/ directory containing the fitted projection files. The weights are organized by dataset and encoder, and can be loaded with:
from src.extraction.vit.vit3d import ViT3D
voxcor = ViT3D.load_pt("weights/<dataset>/<fit>/<encoder>_<fit>/vit3d_model_<specs>.pt")
features = voxcor.transform(batch)The loaded weights include the fitted PCA and WPLS/PCA3D projection matrices, but they do not include the external ViT backbone repositories themselves. Therefore, the corresponding encoder code and pretrained backbone weights must still be available under models/. At transform time, no registration, masks, or fitting-time correspondences are required.
| Dataset | Task | Source |
|---|---|---|
| AbdomenMRCT | Intra-subject MR–CT registration | Learn2Reg 2022 |
| HCP T2w–T1w | Inter-subject brain T2w–T1w registration | Human Connectome Project |
Datasets are not redistributed with this repository. Please obtain them from the original sources and update the dataset paths in the corresponding YAML configuration files.
All scripts are YAML-driven and designed for fault-tolerant, SLURM-ready execution.
Each configuration file is a single YAML mapping. Two reserved keys configure the run as a whole:
__output_dir__: Base directory under which each experiment's outputs are written.__otherwise__: A mapping of default values inherited by every experiment.
Every other key is an experiment definition that overrides the defaults.
Note: The sentinel string
"__none__"is converted to PythonNoneupon loading.
Dataset-Fit
Shared projection fitted on the training split:
python scripts/fitting/dsfit_vit3d.py config/path/to/config.yamlDataset-Fit (Leave-2-Out Cross-Validation)
Per-fold models for L2OCV, used specifically for Abdomen MR–CT:
python scripts/fitting/foldfit_vit3d.py config/path/to/config.yamlPair-Fit
Projection fitted on each test pair individually, used for pair-specific adaptation:
python scripts/fitting/pairfit_vit3d.py config/path/to/config.yamlEvaluates deformable registration using ConvexAdam (CA) or Globally-Initialized ConvexAdam (GICA). The script performs an automatic hyperparameter search (HPS) phase followed by test evaluation, saving atomic checkpoints along the way.
python scripts/registration/registration_evaluation.py config/path/to/config.yamlTo generate parameter sweeps for registration, use the scripts in scripts/reg_param_search/, for example:
avoid_parameters_sweep.pygenerate_convex_adam_parameter_sweep.py
Evaluates label transfer by nearest-neighbor matching in the feature space.
# Standard evaluation
python scripts/segmentation/seg_quad_vit3d.py config/path/to/config.yaml
# L2OCV evaluation (iterates over folds with separate checkpoints)
python scripts/segmentation/seg_quad_perm_vit3d.py config/path/to/config.yamlEvaluates geometric precision using segmentation centers of mass (SCM) as synthetic landmarks.
# Standard evaluation
python scripts/landmarking/lmscm_quad_vit3d.py config/path/to/config.yaml
# L2OCV evaluation
python scripts/landmarking/lmscm_quad_perm_vit3d.py config/path/to/config.yaml| Key | Backbone | Variant used |
|---|---|---|
dinov2 |
DINOv2 | ViT-L/14 |
dinov3 |
DINOv3 | ViT-L/14 |
medsam2i |
MedSAM2 image encoder | MedSAM2_latest.pt |
sam3i |
SAM3 image encoder | sam3.pt |
The encoder is set via the model field in the experiment configuration YAML.
If you use VoxCor, please cite our paper:
TODOImportant: If your pipeline uses Globally-Initialized ConvexAdam (GICA), ConvexAdam, or the MIND descriptor implementations provided in this repository, please also cite the respective original works.
@article{siebert2024convexadam,
title = {ConvexAdam: Self-Configuring Dual-Optimisation-Based 3D Multitask Medical Image Registration},
author = {Siebert, Hanna and Gro{\ss}br{\"o}hmer, Christoph and Hansen, Lasse and Heinrich, Mattias P.},
journal = {IEEE Transactions on Medical Imaging},
year = {2024},
publisher = {IEEE}
}@inproceedings{heinrich2013ssc,
title = {Towards Realtime Multimodal Fusion for Image-Guided Interventions Using Self-Similarities},
author = {Heinrich, Mattias P. and Jenkinson, Mark and Papie{\.z}, Bart{\l}omiej W. and Brady, Michael and Schnabel, Julia A.},
booktitle = {Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2013},
series = {Lecture Notes in Computer Science},
volume = {8151},
pages = {187--194},
year = {2013},
publisher = {Springer},
doi = {10.1007/978-3-642-40811-3_24}
}This project is licensed under the MIT License. See LICENSE for details.
External model repositories, including DINOv2, DINOv3, MedSAM2, and SAM3 are subject to their own licenses.
