This guide explains how to run experiments and generate figures for the paper comparing different POMDP solvers on project end-time estimation problems.
The experiment pipeline consists of two main stages:
- Data Collection: Run experiments and save raw results
- Analysis: Generate plots and tables from saved results
First Generate Experiments. The data in reference_problems/std_div_3 was generated with the following commands .
Note that for the -l 2 -u 52 "large" problems they were generated over multiple runs to fit into the memory of the machines (a 512 GB server). You could do it on smaller machines, but you might need to further split the runs into smaller chunks.
## Generate Data
# Small
julia --project bin/cli.jl evaluate -r 42 --solvers QMDP -n 1000 -l 2 -u 12 --no-plot
# Medium
julia --project bin/cli.jl evaluate -r 42 --solvers QMDP -n 1000 -l 2 -u 26 --no-plot
# Large
julia --project bin/cli.jl evaluate -r 42 --solvers QMDP -n 250 -l 2 -u 52 --no-plot
julia --project bin/cli.jl evaluate -r 43 --solvers QMDP -n 250 -l 2 -u 52 --no-plot
julia --project bin/cli.jl evaluate -r 44 --solvers QMDP -n 250 -l 2 -u 52 --no-plot
julia --project bin/cli.jl evaluate -r 45 --solvers QMDP -n 250 -l 2 -u 52 --no-plot
## Merge Data to create smaller output files to commit
julia --project scripts/merge_simulation_data.jl ./results/std_dev_3_qmdp_l_2_u_12_n_1000_s42/evaluation_results.json -o qmdp_base_l_2_u_12_n_1000.json
julia --project scripts/merge_simulation_data.jl ./results/std_dev_3_qmdp_l_2_u_26_n_1000_s42/evaluation_results.json -o qmdp_base_l_2_u_26_n_1000.json
julia --project scripts/merge_simulation_data.jl -o qmdp_base_l_2_u_52_n_1000 ./results/std_dev_3_qmdp_l_2_u_52_n_250_s42/evaluation_results.json ./results/std_dev_3_qmdp_l_2_u_52_n_250_s43/evaluation_results.json ./results/std_dev_3_qmdp_l_2_u_52_n_250_s44/evaluation_results.json ./results/std_dev_3_qmdp_l_2_u_52_n_250_s45/evaluation_results.json# Run all experiments (this will take some time)
julia --project scripts/run_paper_experiments.jl
# Analyze results (Passing in the experiment directory displayed above))
julia --project scripts/analyze_paper_results.jl PATH/TO/EXPERIMENT/DIRECTORY/
The main experiment script compares 5 solvers across 3 problem sizes:
Solvers:
OBSERVEDTIME: Simple baseline that announces the observed timeMOSTLIKELY: Announces based on most likely belief stateQMDP: QMDP approximation algorithmCXX_SARSOP: Point-based belief-space POMDP solver (C++ implementation)MOMDP_SARSOP: SARSOP solver using MOMDP formulation
Problem Sizes:
- Small: 2-12 timesteps
- Medium: 2-26 timesteps
- Large: 2-52 timesteps
To run experiments with custom settings:
using POMDPPlanning
# Load problem configurations
problem_configs = POMDPPlanning.load_problem_configs("reference_problems")
# Run experiments
experiment_dir, results = POMDPPlanning.run_paper_experiments(
problem_configs,
["OBSERVEDTIME", "MOSTLIKELY", "QMDP", "CXX_SARSOP", "MOMDP_SARSOP"],
"output_directory",
num_simulations = 100, # Simulations per solver/problem
num_detailed_plots = 15, # Number of belief evolution plots to save
policy_timeout = 300, # Timeout for SARSOP solvers (seconds)
seed = 42, # Random seed for reproducibility
verbose = true
)To analyze previously saved results:
julia scripts/analyze_paper_results.jl path/to/experiment_directoryOr programmatically:
include("scripts/analyze_paper_results.jl")
analyze_results("path/to/experiment_directory")After running experiments and analysis, you'll find:
paper_results/
└── paper_experiment_YYYY-MM-DD_HH-MM-SS/
├── experiment_config.json # Experiment configuration
├── results_small.json # Raw results for small problems
├── results_medium.json # Raw results for medium problems
├── results_large.json # Raw results for large problems
├── all_results.json # Combined results
├── belief_evolution_plots/ # Detailed belief evolution plots
│ ├── small/
│ ├── medium/
│ └── large/
└── analysis/ # Generated plots and tables
├── reward_statistics.csv # Reward statistics table
├── reward_table.tex # LaTeX formatted reward table
├── comparison_statistics.csv # Performance metrics table
├── statistics_table.tex # LaTeX formatted statistics table
├── reward_comparison_*.png # Reward comparison plots
├── histograms/ # Reward distribution histograms
├── statistics_plots/ # Performance metric plots
└── combined_plots/ # Multi-metric comparison plots
- reward_table.tex: Mean rewards with standard deviations by solver and problem size
- statistics_table.tex: Comprehensive performance metrics including:
- Average number of announcement changes
- Standard deviation of announcement changes
- Average final error
- Percentage of incorrect final predictions
- Average magnitude of announcement changes
- Policy generation time
-
Reward Comparisons:
reward_comparison_small.png,_medium.png,_large.png: Bar charts with error barsreward_comparison_combined.png: Grouped bar chart across all problem sizes
-
Reward Distributions:
histograms/hist_*_*.png: Individual histograms for each solver/problem combinationhistograms/hist_*_combined.png: Overlaid histograms by problem sizehistograms/hist_*_all_sizes.png: Overlaid histograms by solver
-
Performance Metrics:
statistics_plots/avg_announcement_changes.png: Changes over problem sizesstatistics_plots/avg_final_error.png: Final error over problem sizesstatistics_plots/incorrect_predictions.png: Error rates over problem sizes
-
Combined Analysis:
combined_plots/key_metrics_comparison.png: 2x2 subplot of key metrics
-
Belief Evolution:
belief_evolution_plots/*/: 2D heatmaps showing belief evolution with announced times overlaid
Edit POLICY_TIMEOUT in scripts/run_paper_experiments.jl:
POLICY_TIMEOUT = 600 # 10 minutes instead of 5Edit NUM_SIMULATIONS in scripts/run_paper_experiments.jl:
NUM_SIMULATIONS = 200 # More simulations for smoother statisticsModify generate_statistics_table() in scripts/analyze_paper_results.jl to compute additional metrics from the raw simulation data.
All experiments use fixed random seeds from previous experiment runs for reproducibility.