This tutorial will guide you through creating an experiment using Marin's executor framework. We'll build a simple experiment that:
- Generates a sequence of numbers
- Computes basic statistics on those numbers
In our First Experiment, we trained a tiny model on TinyStories. That tutorial used the executor framework to run a sequence of steps, but didn't really cover how it works.
In this tutorial, you will learn:
- How to define steps in Marin
- How to connect steps together
- How to run an experiment
- How to inspect the output of an experiment
Before starting this tutorial, make sure you have:
- Completed the installation.
A Marin experiment consists of one or more ExecutorSteps that can be chained together. Each step:
- Has a unique name and description
- Takes a configuration object
- Processes data and produces output
- Can depend on outputs from previous steps
Let's start by importing the necessary modules:
import json
import logging
import os
from dataclasses import dataclass
import fsspec
from marin.execution.executor import (
ExecutorStep,
executor_main,
output_path_of,
this_output_path
)Key imports:
dataclass: For creating configuration classesfsspec: For file system operations (local or cloud)marin.execution.executor: Core components for building experiments
First, we'll create a step that generates numbers from 0 to n-1:
@dataclass(frozen=True)
class GenerateDataConfig:
n: int # Number of data points to generate
output_path: str # Where to write the numbers
def generate_data(config: GenerateDataConfig):
"""Generate numbers from 0 to `n` - 1 and write them to `output_path`."""
numbers = list(range(config.n))
# Write to file
numbers_path = os.path.join(config.output_path, "numbers.json")
with fsspec.open(numbers_path, "w") as f:
json.dump(numbers, f)Next, we'll create a second step that reads the generated numbers and computes statistics:
@dataclass(frozen=True)
class ComputeStatsConfig:
input_path: str # Path to the file with numbers
output_path: str # Where to write the stats
def compute_stats(config: ComputeStatsConfig):
"""Compute statistics on the input numbers and write results."""
# Read from file
numbers_path = os.path.join(config.input_path, "numbers.json")
with fsspec.open(numbers_path) as f:
numbers = json.load(f)
# Compute statistics
stats = {
"sum": sum(numbers),
"min": min(numbers),
"max": max(numbers),
}
stats_path = os.path.join(config.output_path, "stats.json")
with open(stats_path, "w") as f:
json.dump(stats, f)Now we'll create the experiment pipeline by connecting our steps:
n = 100 # Number of data points to generate
# Step 1: Generate data
data = ExecutorStep(
name="hello_world/data",
description=f"Generate data from 0 to {n}-1.",
fn=generate_data,
config=GenerateDataConfig(
n=n,
output_path=this_output_path(),
),
)
# Step 2: Compute statistics
stats = ExecutorStep(
name="hello_world/stats",
description="Compute stats of the generated data.",
fn=compute_stats,
config=ComputeStatsConfig(
input_path=output_path_of(data), # Use output from previous step
output_path=this_output_path(),
),
)
# Run the experiment
if __name__ == "__main__":
executor_main(
steps=[data, stats],
description="Simple experiment to compute stats of some numbers.",
)To run this experiment:
python experiments/tutorials/hello_world.py --prefix local_storeThis command will create several output files:
local_store/experiments/hello_world-7063e5.json: Stores a record of all the steps in this experimentlocal_store/hello_world/data-d50b06: Contains the output of step 1 (numbers.json with generated data)local_store/hello_world/stats-b5daf3: Contains the output of step 2 (stats.json with computed statistics)
!!! note
If you run the same command again, it will detect that both steps have already been run and return automatically. This saves computation time when rerunning experiments.
The complete code for this tutorial is available at: experiments/tutorials/hello_world.py
- Train a tiny language model using Marin.
- Learn about the Executor framework: how to manage Python libraries, run big parallel jobs via Fray, how versioning works, etc.