Q&A: Phase 11.3 ValueLearner — learning rate tuning, cold-start, comparative signals, federation feedback, and Grafana monitoring #345
Unanswered
web3guru888
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Q&A: Phase 11.3 ValueLearner — configuration, training, federation integration, and monitoring
Reference: issue #343 (spec) | discussion #344 (architecture Show & Tell)
Q1: How do I tune
learning_rateandregularisation_lambda?Start with the defaults (
lr=0.01,lambda=1e-4) and monitorvalue_learner_weight_norm. If the norm grows beyond ~1.5 during normal operation, increaselambdaby 10×. If the model fails to differentiate APPROVE from REJECT signals after 100+ feedback entries, increaselrby 2×.weight_norm> 2.0lambdalrweight_normoscillates ±0.3/epochlrby 2×min_feedback_to_trainQ2: How does
min_feedback_to_traininteract withSafetyFilter?Before
min_feedback_to_trainsamples are collected,train()returns the current (default)RewardModelWeightswithout updating them. During this cold-start period,SafetyFilteroperates on its staticConstitutionalRulesetwith no learned adjustments. After the threshold is crossed,SafetyFiltercan optionally queryscore()to bias rule sensitivity per dimension:A low CONSTITUTIONAL weight → stricter effective threshold → more BLOCK verdicts.
Q3: What happens to COMPARATIVE signals — who encodes the pairs?
COMPARATIVE pairs must be submitted as two consecutive
FeedbackEntryrecords withsignal=FeedbackSignal.COMPARATIVE. The first entry encodes the preferred option; the second encodes the rejected option. Both must share the samedimension:The
train()method processes them in order. Mismatched pairs (odd total) are discarded with a warning.Q4: How does
AlignmentMonitorfeedValueLearnerautomatically?When
AlignmentMonitorraises an alert (score drops below threshold),CognitiveCycle._on_alignment_alert()converts it to anIMPLICIT_NEGATIVEentry:This creates an automatic feedback loop: drift → implicit negative → lower weight → stricter gating → corrective pressure.
Q5: What does
max_history=10_000mean in practice?_historyis acollections.deque(maxlen=10_000). When the 10,001st entry arrives, the oldest is silently evicted. At ~200 bytes perFeedbackEntry, 10,000 entries ≈ 2 MB of in-memory feedback. For production, consider:max_historyNote that
train()only uses_pending(since last train), not the full history._historyis kept for snapshot reporting and future replay capability.Q6: Can federation peers submit feedback via
FederationGateway?Yes — the intended pattern is:
annotator_idshould be the peer's DID, enabling future per-annotator trust tracking.Q7: What Grafana panels and alerts should I configure?
Recommended dashboard panels:
rate(value_learner_feedback_total[5m])value_learner_model_versionhistogram_quantile(0.95, value_learner_train_duration_seconds)value_learner_weight_normvalue_learner_pending_feedbackRecommended alert:
Beta Was this translation helpful? Give feedback.
All reactions