SegJointGene is a self-training framework for spatial cell-type segmentation that integrates Computational Information Discarding (CID) to constrain iterative label propagation. The method combines a segmentation network with attribution-guided label updates to progressively refine pixel-wise class and instance labels.
This codebase is tested with:
Python 3.11.8
Typical requirements include:
torchnumpyargparsecv2captumskimagescipytifffilepandasmatplotlib
You can install by conda:
conda create -n SegJointGene python=3.11 -yconda activate SegJointGeneconda install pytorch pytorch-cuda=12.1 captum numpy scipy pandas matplotlib opencv scikit-image tifffile -c pytorch -c nvidia -c conda-forge
SegJointGene-CID follows an iterative self-training paradigm with attribution-based constraints:
-
Segmentation Network A UNet-style network predicts pixel-wise cell-type labels for each image patch.
-
Self-Training with Dynamic Labels Instead of using fixed ground-truth labels, the dataset maintains dynamic labels that are updated after each iteration.
-
CID Attribution For selected cell types and genes, CID computes pixel-wise attribution by optimizing an input noise mask while freezing network weights. Pixels that tolerate larger noise are considered less informative.
-
Attribution-Constrained Label Update During label propagation:
- Predictions must satisfy spatial consistency and confidence constraints.
- The dominant attribution class must match the propagated class. This suppresses spurious label expansion and stabilizes self-training.
-
Iterative Refinement The process repeats over epochs, progressively improving segmentation performance.
The framework uses a patch-based dataset, where each sample is stored as a .npz file.
The dataset class (ImagePatchDataset) supports:
- Immutable fixed labels (used as a core mask)
- Mutable dynamic labels (updated during self-training)
- Persistent label caching across epochs
Dynamic labels are automatically loaded and saved during training.
Each patch file must contain the following keys:
| Key | Shape / Type | Description |
|---|---|---|
image |
(C, H, W) float32 |
Input image (e.g. gene expression channels) |
label |
(H, W) int |
Initial class label map |
instance |
(H, W) int |
Initial instance label map |
spots |
(H, W) float32 |
Spot density or auxiliary spatial signal |
dapi |
(H, W) float32 |
DAPI or reference channel |
File naming must follow:
p__.npz
where <row> and <col> indicate the spatial grid position of the patch.
One section from the mouse hippocumpus dataset is in data/CA1_raw/3_1_left, from Probabilistic cell typing enables fine mapping of closely related cell types in situ.
The paper provided a link to download full dataset: mouse hippocumpus CA1 region.
To preprocess this dataset and start training, run:
python main.py --datasets_name=CA1 --step_name=preprocess_CA1 --CA1_sub_path=3_1_left
The main entry point is main.py.
To run the CID-based self-training pipeline:
python main.py --datasets_name=CA1 --step_name=SegJointGene --attr_method=CID --net_sub_suffix=SegJointGene_CID
This will:
- Initialize the segmentation network
- Load patch-based data from
data/CA1/ - Run iterative self-training with CID attribution
- Automatically manage dynamic label caching and checkpoints
| Argument | Description |
|---|---|
--patch_size |
Patch resolution |
--attr_epoch |
Epoch to start CID attribution |
--attr_n_gene |
Number of target genes for attribution |
--attr_n_celltype |
Number of target cell types for attribution |
--CID_n_steps |
Optimization steps for CID |
--CID_chunk_size |
Number of cell types processed per CID chunk |
--if_load_ckpt |
Resume from a saved checkpoint |
All arguments are defined in main.py.
During execution, the framework will automatically:
- Update dynamic labels in memory
- Save label caches at regular intervals
- Save model checkpoints
SegJointGene provides a minimal yet expressive framework for attribution-guided self-training in spatial segmentation, combining:
- Patch-based segmentation
- Dynamic label propagation
- Computational Information Discarding guided training