Skip to content

mvrcii/vesuvius_first_title_prize

Repository files navigation

title_banner

Vesuvius Challenge - First Title Prize 🏆

Winner of the First Title Prize on Scroll 5 (P.Herc 172) By Micha Nowak and Marcel Roth

This repository contains the inference pipeline, training scripts, and custom architecture used to identify the first title in the carbonized Herculaneum scrolls.

⚡ Technical Highlights (Why this works)

To solve the "impossible" problem of detecting ink in 3D noise, we couldn't rely on standard model architectures. We engineered three key innovations:

1. Architecture Innovation: MiniUNETR

We adapted the UNETR (Vision Transformer for Medical Segmentation) into a lightweight variant, MiniUNETR.

  • Why: Standard 3D UNets were too heavy for rapid iteration on our available hardware (RTX 4090).
  • Impact: Reduced training time to 1 hour from scratch, allowing us to iterate 10x faster than competitors using massive compute. This embodies our philosophy of efficient, high-agency research.

2. The "Ignore Mask" Data Engine

We solved the problem of noisy labels by implementing a Masked Loss Strategy.

  • Process: Instead of forcing the model to learn from uncertain annotations, we generated "ignore masks" for ambiguous regions.
  • Result: This prevented the model from learning false positives/negatives, effectively "denoising" the dataset through iterative training cycles.

3. 3D Volumetric Focus

Unlike 2D approaches, we processed the scroll segments as 3D chunks (variable depth), preserving the volumetric context of the ink within the carbon fibers.


Method & Approach

Our approach to extracting the title from Scroll 5 began with informed hypotheses about its likely location. The domain's inherent data limitations led us to focus heavily on improving data quality rather than just scaling parameters.

We utilized an iterative process of:

  1. Manual Annotation: Expertly labeling clear ink signals.
  2. Strategic Masking: Applying our "Ignore Mask" to low-quality data regions.
  3. Rapid Retraining: Using MiniUNETR to validate hypotheses in near real-time.

Quickstart: Inference

Prerequisites

  • Hardware: Min. 24 GB VRAM (Tested on RTX 4090). 96GB RAM recommended.
  • Software: Python >= 3.8, Conda.
  1. Clone the repository
  2. Download the checkpoint and place it in checkpoints/scroll5/warm-planet-193-unetr-sf-b3-250417-171532
  3. Execute the following command from the root directory. This will set up the conda environment with the correct python version and install the required dependencies.
python init_env.py
  1. Activate the conda environment with
conda activate scroll5-title
  1. Run the following script to download the required layers, preprocess them and finally run inference on the title chunk. Note that the resulting image is flipped horizontally (and must be flipped to match our submission).
./infer_title.sh
  1. The results directory where the predictions will be saved will be printed to the console. It will contain 2 subdirectories. visualizations and npy_files. To reproduce the exact image we submitted, run our scripts/overlay_viewer.py UI, and select the resulting npy_files directory. Then select horizontal flip, average and set boost to 3 (Make sure invert colors is unchecked).

Quickstart - Training

  1. Clone the repository
  2. Execute the following command from the root directory. This will set up the conda environment with the correct python version, install torch and all required packages, and finally installs our phoenix package.
python init_env.py
  1. Activate the conda environment
conda activate scroll5-title
  1. Run the following commands one by another. We download, chunk and pre-process the required fragment. Then we create the training dataset and start to train the model on the fragment chunks specified by the config.
python scripts/download_fragments.py --fragment 03192025
python scripts/fragment_splitter.py 03192025 --scroll-id 5 -ch 1,3,4,9,11,13,15,19,20,21,24,25,26,27,28,29 --contrasted
python scripts/create_dataset.py configs/ft_no_title.py --ide_is_closed
python scripts/train.py configs/ft_no_title.py

Important notes:

  • download_fragments.py is as of now hardcoded for Sean's autosegmentation fragments. The script also skips existing files.
  • fragment_splitter.py is user-friendly as it only creates the chunks as given by the command. It skips the creation of already existing chunk files and if interrupted while processing, we continue where we left off.
  • create_dataset.py is written very efficiently, meaning that it consumes almost 100% of all CPU cores. We recommend to execute it from within a console and not within an IDE as this might crash the IDE.
  • We trained the submitted model for a total of 14 epochs.

Supplementary Info: The "Ignore Mask"

Training data

We used the following two VC3D auto-segmentations as training data: 02110815 and 03192025.

Our model was trained on auto-segmentations of 21 layers in total, using the top 16 (index 5 to 20) as input, and therefore probably won't apply to traditional 65-layer segmentations, but we haven't tested this ablation. Note that all input data for inference must be contrasted with the logic implemented in our fragment_splitter.py.

Our fragment_splitter.py script splits your large layer files into 10,000 pixel wide chunks and applies contrast. We applied contrast to all of the fragment files because we found that it made it easier to visually detect very subtle ink crackles during annotating, and seemed to improve the performance of our segmentation model.

For this reason, our model also requires the contrasted layer files for the inference.

contrast_compare

Masking

The labels were iteratively refined and cleaned over several rounds of training and inference, starting with 02110815 and continuing with 03192025. Our final labels per chunk can be found in their respective subdirectories within data, and each consists of two files: an ink label (label.png) and an ignore mask (ignore.png).

We use the ignore mask to mark areas where the model's predictions were uncertain in the previous run. Rather than marking an uncertain area as ink or no-ink, we simply cover it with the ignore mask, removing all covered pixels from the loss calculation — thus avoiding the propagation of false labels.

The image below shows an example of this ignore mask. Instead of filling in partial letters by hand (with ink labels that were not present in the prediction), we cover incomplete letters with the ignore mask (red) — effectively adding no label to those pixels and allowing the model to figure it out on its own.

ignore_smol

About

Winning Solution for Vesuvius Challenge First Title. Custom MiniUNETR 3D architecture for ink detection in carbonized scrolls.

Topics

Resources

Stars

Watchers

Forks

Contributors