Official inference release for:
MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments
ICCV 2025
- Multi-view consistent generation from depth-only trajectories.
- Zero-shot inference pipeline built on SDXL + ControlNet.
- Includes ready-to-run sample data in the expected input format.
- Lightweight release sample data is included in this repository (already filtered by
final_idx.npy). - Full dataset download: Google Drive
code_release/
├── README.md
├── requirements.txt
├── assets/
│ └── media/
│ ├── teaser.gif
│ └── teasercrop.mov
├── scripts/
│ └── run_inference.sh
└── mosaic/
├── data/
│ └── ep*/sp*/
│ ├── depth_raw/
│ ├── position/
│ ├── rotation/
│ ├── gpt_prompt/
│ └── final_idx.npy
└── src/
├── iccv_ours_weight8.py
├── iccv_ours_weight8_pixel.py
├── euler_scheduler.py
├── run_scene_inference.sh # main single-scene inference entrypoint
├── utils.py
└── loss/
conda create -n mosaic python=3.10 -y
conda activate mosaic
pip install --upgrade pip
pip install -r requirements.txthuggingface-cli loginFrom repository root:
bash scripts/run_inference.sh \
--prompt "in van gogh style" \
--data-dir ../data/ep7/sp4 \
--script iccv_ours_weight8.py \
--output-root ../outputs \
--gpu-id 0Generated images are saved to:
mosaic/outputs/<script_name>/epX/spY/<prompt>/output_*.png
scripts/run_inference.sh: root-level launcher. It entersmosaic/src/and calls the main entrypoint.mosaic/src/run_scene_inference.sh: main inference runner for one scene folder (epX/spY).
Each scene folder should follow:
<scene_root>/
├── depth_raw/depth_raw_*.npy
├── position/position_*.npy
├── rotation/rotation_*.npy
├── gpt_prompt/gpt_prompt_*.txt
└── final_idx.npy
run_scene_inference.sh validates required subfolders/files before launching inference.
Note: The released sample data in this repository has been pre-filtered by final_idx.npy (non-keyframe entries are removed, full dataset can be downloaded on the drive).
@inproceedings{liu2025mosaic,
title={Mosaic: Generating consistent, privacy-preserving scenes from multiple depth views in multi-room environments},
author={Liu, Zhixuan and Zhu, Haokun and Chen, Rui and Francis, Jonathan and Hwang, Soonmin and Zhang, Ji and Oh, Jean},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={27456--27465},
year={2025}
}