Show & Tell: DreamPlanner — MPC in imagination space (Phase 13.2) #372

web3guru888 · 2026-04-13T02:03:26Z

web3guru888
Apr 13, 2026
Maintainer

DreamPlanner architecture: MPC in imagination space

Phase 13.2 introduces DreamPlanner — a model-predictive control component that evaluates candidate action sequences in imagination using the WorldModel from Phase 13.1 (#368), then selects the best action to execute in the real environment.

Component map

CognitiveCycle._model_based_step(obs)
        │
        ▼
DreamPlanner.plan(obs, world_model)
        │
        ├─── RANDOM_SHOOTING ─── dream_rollout() × N ──► select best
        │
        ├─── CEM ─────────────── dream_rollout() × N × K iterations ──► elite re-fit ──► best
        │
        ├─── BEAM_SEARCH ──────── WorldModel.predict() × beam_width × horizon ──► best beam
        │
        └─── GREEDY ──────────── dream_rollout() × 1 ──► first action
                │
                ▼
          PlanResult.best_action ──► CognitiveCycle executes

Strategy comparison

Strategy	WM calls	Quality	Latency	Best for
`RANDOM_SHOOTING`	N rollouts	Good	Low	Fast envs, noisy reward
`CEM`	N × K rollouts	Best	Highest	Smooth landscapes, offline planning
`BEAM_SEARCH`	beam_width × horizon predictions	Moderate	Medium	Deterministic discrete actions
`GREEDY`	1 rollout	Baseline	Lowest	Ablation / debug

RANDOM_SHOOTING inner loop

for _ in range(n_candidates):
    if time_budget_exceeded():
        break
    seq = random_action_sequence(H)
    rollout = await wm.dream_rollout(ModelInput(obs=obs, action=seq[0]), horizon=H)
    total_r = discounted_return(rollout)
    if rollout.terminal_surprise > surprise_abort_threshold:
        continue  # world model uncertain — skip
    candidates.append(ActionCandidate(seq, total_r, ...))
best = max(candidates, key=lambda c: c.predicted_reward)

CEM: Cross-Entropy Method

CEM maintains a distribution over action sequences and iteratively refines it:

Sample N sequences from current distribution (initially uniform)
Evaluate each via dream_rollout()
Select elite fraction (top 20%)
Refit distribution from elite samples
Repeat K times

# Entropy-convergence early exit
entropy = -np.sum(mean * np.log(mean + 1e-8), axis=-1).mean()
if entropy < 0.01:
    break  # distribution has converged

BEAM_SEARCH node expansion

Unlike RANDOM_SHOOTING/CEM, BEAM_SEARCH calls WorldModel.predict() (single-step) at each node, making it deterministic and suitable for discrete action spaces:

Step 0: obs₀ → expand all 4 actions → 4 (reward, next_obs) pairs
Step 1: keep top-8 beams → expand each → prune to top-8
...
Step H: return best beam's action_sequence[0]

Surprise-abort guard

If the WorldModel has high prediction error in a region (high terminal_surprise), the DreamPlanner discards those candidates:

if cand.terminal_surprise > cfg.surprise_abort_threshold:
    _PLAN_SURPRISE_ABORT.inc()
    continue

This creates a feedback loop: WorldModel uncertainty → DreamPlanner caution → agent avoids uncharted territory until WorldModel updates.

Prometheus metrics

Metric	Type	Description
`asi_dream_plan_latency_ms`	Histogram	Wall-clock per `plan()` call
`asi_dream_plan_candidates_total`	Counter	Candidates evaluated
`asi_dream_plan_outcome_total`	Counter	SUCCESS / NO_CANDIDATES / SURPRISE_ABORT
`asi_dream_plan_predicted_reward`	Gauge	Best predicted reward (last plan)
`asi_dream_plan_surprise_abort_total`	Counter	Plans aborted due to high WM surprise

PromQL examples

# Planning success rate
rate(asi_dream_plan_outcome_total{outcome="success"}[5m])
/ rate(asi_dream_plan_outcome_total[5m])

# 99th-percentile planning latency
histogram_quantile(0.99, rate(asi_dream_plan_latency_ms_bucket{strategy="cem"}[5m]))

Open questions

Latent-space planning: Should DreamPlanner operate over encoded latent states (LatentStateEncoder, Phase 13.4) rather than raw observations? Latent MPC is more sample-efficient but requires a working encoder.
Reward model: Currently rewards come from WorldModel.predict().reward. Should there be a separate RewardModel component, or is the bundled reward sufficient?
Multi-step commitment: Should DreamPlanner commit to the full planned sequence, or re-plan at every step? Re-planning is safer but more expensive.

Discussion in #371.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show & Tell: DreamPlanner — MPC in imagination space (Phase 13.2) #372

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Show & Tell: DreamPlanner — MPC in imagination space (Phase 13.2) #372

Uh oh!

web3guru888 Apr 13, 2026 Maintainer

DreamPlanner architecture: MPC in imagination space

Component map

Strategy comparison

RANDOM_SHOOTING inner loop

CEM: Cross-Entropy Method

BEAM_SEARCH node expansion

Surprise-abort guard

Prometheus metrics

PromQL examples

Open questions

Replies: 0 comments

web3guru888
Apr 13, 2026
Maintainer