|
| 1 | +--- |
| 2 | +title: "Omni-ST: Instruction-Driven Any-to-Any Multimodal Modeling for Spatial Transcriptomics" |
| 3 | +date: 2026-01-29 |
| 4 | +lastmod: 2026-01-29 |
| 5 | + |
| 6 | +authors: |
| 7 | + - "Xi Li" |
| 8 | + |
| 9 | +tags: |
| 10 | + - osre26 |
| 11 | + - spatial-transcriptomics |
| 12 | + - multimodal |
| 13 | + - instruction-tuning |
| 14 | + - computational-pathology |
| 15 | + |
| 16 | +summary: "A unified instruction-driven multimodal framework that enables any-to-any translation across images, gene expression, spatial graphs, and text in spatial transcriptomics." |
| 17 | +--- |
| 18 | + |
| 19 | +## Project description |
| 20 | + |
| 21 | +Spatial transcriptomics (ST) integrates spatially resolved gene expression with tissue morphology, enabling the study of cellular organization, tissue architecture, and disease microenvironments. Modern ST datasets are inherently multimodal, combining histology images (H&E / IF), gene expression vectors, spatial graphs, cell annotations, and free-text pathology descriptions. |
| 22 | + |
| 23 | +However, most existing ST methods are task-specific and modality-siloed: separate models are trained for image-to-gene prediction, spatial domain identification, cell type classification, or text-based interpretation. This fragmentation limits cross-task generalization and scalability. |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +**Omni-ST** proposes a single **instruction-driven any-to-any multimodal backbone** that treats each spatial transcriptomics modality as a “language” and formulates all tasks as: |
| 30 | + |
| 31 | +**Instruction + Input Modality → Output Modality** |
| 32 | + |
| 33 | +Natural language is elevated from auxiliary metadata to a **unifying interface** that specifies task intent, target modality, and biological context. This paradigm enables flexible, interpretable, and extensible spatial reasoning within a single model. |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +### Project Idea: Instruction-Driven Any-to-Any Modeling for Spatial Transcriptomics |
| 38 | + |
| 39 | +**Topics:** spatial transcriptomics, multimodal learning, instruction tuning, computational pathology |
| 40 | +**Skills:** PyTorch, deep learning, Transformers, multimodal representation learning |
| 41 | +**Difficulty:** Hard |
| 42 | +**Size:** 350 hours |
| 43 | + |
| 44 | +**Mentor:** |
| 45 | +- **Xi Li** — <mailto:xil43@uci.edu> |
| 46 | + |
| 47 | +**Essential information:** |
| 48 | +- Design a unified multimodal backbone with lightweight modality adapters for histology images, gene expression vectors, spatial graphs, and text. |
| 49 | +- Use natural language instructions to condition model behavior, enabling any-to-any translation without task-specific heads. |
| 50 | +- Support core tasks including image → gene expression prediction, gene expression → cell type / spatial domain identification, region → text-based biological explanation, and text-based spatial retrieval. |
| 51 | +- Evaluate the model across multiple spatial transcriptomics tasks within a single framework, emphasizing generalization and interpretability. |
| 52 | +- Develop visualization and interpretation tools such as spatial maps and language-grounded explanations. |
| 53 | + |
| 54 | +**Expected deliverables:** |
| 55 | +- An open-source PyTorch implementation of the Omni-ST framework. |
| 56 | +- Unified multitask benchmarks for spatial transcriptomics. |
| 57 | +- Visualization and interpretation tools for spatial predictions. |
| 58 | +- Documentation and tutorials demonstrating how to add new tasks via instructions. |
0 commit comments