GitHub - daifengwanglab/segjointgene

Overview

SegJointGene is a self-training framework for spatial cell-type segmentation that integrates Computational Information Discarding (CID) to constrain iterative label propagation. The method combines a segmentation network with attribution-guided label updates to progressively refine pixel-wise class and instance labels.

1. Environment & Installation

Python Version

This codebase is tested with:

Python 3.11.8

Dependencies

Typical requirements include:

torch
numpy
argparse
cv2
captum
skimage
scipy
tifffile
pandas
matplotlib

You can install by conda:

conda create -n SegJointGene python=3.11 -y
conda activate SegJointGene
conda install pytorch pytorch-cuda=12.1 captum numpy scipy pandas matplotlib opencv scikit-image tifffile -c pytorch -c nvidia -c conda-forge

2. Algorithm Overview

SegJointGene-CID follows an iterative self-training paradigm with attribution-based constraints:

Segmentation Network A UNet-style network predicts pixel-wise cell-type labels for each image patch.
Self-Training with Dynamic Labels Instead of using fixed ground-truth labels, the dataset maintains dynamic labels that are updated after each iteration.
CID Attribution For selected cell types and genes, CID computes pixel-wise attribution by optimizing an input noise mask while freezing network weights. Pixels that tolerate larger noise are considered less informative.
Attribution-Constrained Label Update During label propagation:
- Predictions must satisfy spatial consistency and confidence constraints.
- The dominant attribution class must match the propagated class. This suppresses spurious label expansion and stabilizes self-training.
Iterative Refinement The process repeats over epochs, progressively improving segmentation performance.

3. Dataset Interface

The framework uses a patch-based dataset, where each sample is stored as a .npz file.

Dataset Class

The dataset class (ImagePatchDataset) supports:

Immutable fixed labels (used as a core mask)
Mutable dynamic labels (updated during self-training)
Persistent label caching across epochs

Dynamic labels are automatically loaded and saved during training.

Required `.npz` File Format

Each patch file must contain the following keys:

Key	Shape / Type	Description
`image`	`(C, H, W)` float32	Input image (e.g. gene expression channels)
`label`	`(H, W)` int	Initial class label map
`instance`	`(H, W)` int	Initial instance label map
`spots`	`(H, W)` float32	Spot density or auxiliary spatial signal
`dapi`	`(H, W)` float32	DAPI or reference channel

File naming must follow:

p__.npz

where <row> and <col> indicate the spatial grid position of the patch.

4. Preprocess demo dataset dataset

One section from the mouse hippocumpus dataset is in data/CA1_raw/3_1_left, from Probabilistic cell typing enables fine mapping of closely related cell types in situ. The paper provided a link to download full dataset: mouse hippocumpus CA1 region.

To preprocess this dataset and start training, run:

python main.py --datasets_name=CA1 --step_name=preprocess_CA1 --CA1_sub_path=3_1_left

5. Running SegJointGene

Basic Command

The main entry point is main.py. To run the CID-based self-training pipeline:

python main.py --datasets_name=CA1 --step_name=SegJointGene --attr_method=CID --net_sub_suffix=SegJointGene_CID

This will:

Initialize the segmentation network
Load patch-based data from data/CA1/
Run iterative self-training with CID attribution
Automatically manage dynamic label caching and checkpoints

Commonly Used Arguments

Argument	Description
`--patch_size`	Patch resolution
`--attr_epoch`	Epoch to start CID attribution
`--attr_n_gene`	Number of target genes for attribution
`--attr_n_celltype`	Number of target cell types for attribution
`--CID_n_steps`	Optimization steps for CID
`--CID_chunk_size`	Number of cell types processed per CID chunk
`--if_load_ckpt`	Resume from a saved checkpoint

All arguments are defined in main.py.

6. Output

During execution, the framework will automatically:

Update dynamic labels in memory
Save label caches at regular intervals
Save model checkpoints

7. Summary

SegJointGene provides a minimal yet expressive framework for attribution-guided self-training in spatial segmentation, combining:

Patch-based segmentation
Dynamic label propagation
Computational Information Discarding guided training

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
SegJointGene		SegJointGene
data/CA1_raw/3_1_left		data/CA1_raw/3_1_left
README.md		README.md
main.py		main.py
step_SegJointGene.py		step_SegJointGene.py
step_SegjointGene.py		step_SegjointGene.py
step_preprocess_CA1.py		step_preprocess_CA1.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

1. Environment & Installation

Python Version

Dependencies

2. Algorithm Overview

3. Dataset Interface

Dataset Class

Required `.npz` File Format

4. Preprocess demo dataset dataset

5. Running SegJointGene

Basic Command

Commonly Used Arguments

6. Output

7. Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

1. Environment & Installation

Python Version

Dependencies

2. Algorithm Overview

3. Dataset Interface

Dataset Class

Required .npz File Format

4. Preprocess demo dataset dataset

5. Running SegJointGene

Basic Command

Commonly Used Arguments

6. Output

7. Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Required `.npz` File Format

Packages