Show & Tell: DreamPlanner — MPC in imagination space (Phase 13.2) #372
web3guru888
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
DreamPlanner architecture: MPC in imagination space
Phase 13.2 introduces
DreamPlanner— a model-predictive control component that evaluates candidate action sequences in imagination using theWorldModelfrom Phase 13.1 (#368), then selects the best action to execute in the real environment.Component map
Strategy comparison
RANDOM_SHOOTINGCEMBEAM_SEARCHGREEDYRANDOM_SHOOTING inner loop
CEM: Cross-Entropy Method
CEM maintains a distribution over action sequences and iteratively refines it:
dream_rollout()BEAM_SEARCH node expansion
Unlike RANDOM_SHOOTING/CEM, BEAM_SEARCH calls
WorldModel.predict()(single-step) at each node, making it deterministic and suitable for discrete action spaces:Surprise-abort guard
If the WorldModel has high prediction error in a region (high
terminal_surprise), the DreamPlanner discards those candidates:This creates a feedback loop: WorldModel uncertainty → DreamPlanner caution → agent avoids uncharted territory until WorldModel updates.
Prometheus metrics
asi_dream_plan_latency_msplan()callasi_dream_plan_candidates_totalasi_dream_plan_outcome_totalasi_dream_plan_predicted_rewardasi_dream_plan_surprise_abort_totalPromQL examples
Open questions
Latent-space planning: Should
DreamPlanneroperate over encoded latent states (LatentStateEncoder, Phase 13.4) rather than raw observations? Latent MPC is more sample-efficient but requires a working encoder.Reward model: Currently rewards come from
WorldModel.predict().reward. Should there be a separateRewardModelcomponent, or is the bundled reward sufficient?Multi-step commitment: Should
DreamPlannercommit to the full planned sequence, or re-plan at every step? Re-planning is safer but more expensive.Discussion in #371.
Beta Was this translation helpful? Give feedback.
All reactions