Q&A: Phase 21.2 — AffectDetector lexicon design, negation handling & multimodal fusion #510
Unanswered
web3guru888
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
❓ AffectDetector — Lexicon Design, Negation Handling & Multimodal Fusion
Common questions about the Phase 21.2 AffectDetector design — lexicon vs ML tradeoffs, negation window sizing, intensifier handling, multimodal fusion strategies, and operational tuning.
Issue: #499 | Show & Tell: #507 | Planning: #497
Q1: Why start with a lexicon-based approach instead of a pre-trained ML model?
A: Three reasons for lexicon-first:
Determinism — Rule-based detection produces identical results for identical inputs, making tests reliable and debugging straightforward. ML models introduce non-determinism via float precision and batching.
Zero dependencies — No
torch,transformers, or model weights. The lexicon is a simple TSV file (~3KB for AFINN). This keeps the package lightweight and install fast.Protocol boundary — The
AffectDetectorProtocol means ML backends are a drop-in replacement. When you need BERT-level accuracy, implementTransformerAffectDetectorbehind the same interface — all consumers (EmotionModel, MoodRegulator) work unchanged.Tradeoff: Lexicon misses sarcasm, context-dependent sentiment, and novel slang. The
confidencefield reflects this — lexicon signals typically have lower confidence than ML, and downstream consumers weight by confidence automatically.Q2: Why a 3-token negation window? Why not sentence-wide?
A: The 3-token window is an empirical sweet spot from sentiment analysis literature (Taboada et al., 2011):
The 3-token window catches common patterns like "not good", "not very good", and "never really liked" while avoiding over-negation in compound sentences.
Configurable: If you need a different window, subclass
LexiconAffectDetectorand override_apply_negation()— the method is intentionally isolated.Q3: How do intensifiers interact with negation?
A: Intensifiers are applied after negation, so they amplify the already-flipped score:
This matches linguistic intuition — "not very good" is more negative than "not good".
Order matters: If intensifiers ran first, "not very good" would be:
+3 × 1.5 = +4.5 → negate → -4.5, which gives the same magnitude but the pipeline is cleaner with negation-first because the intensifier always amplifies the "semantic" polarity.Q4: How does multimodal fusion handle missing modalities?
A: Gracefully — missing modalities are simply excluded from the weight normalisation:
Example: If only TEXT (w=0.50) and AUDIO (w=0.25) are present:
total_weight = 0.750.50/0.75 = 66.7%, AUDIO contributes0.25/0.75 = 33.3%This means text-only detection works identically to single-modality mode — no special-casing needed.
Q5: How should I tune the confidence threshold?
A: The default
0.3filters out very low-coverage sentences (fewer than 15% of tokens found in lexicon). Tuning guidance:Diagnostic: Monitor
affect_low_confidence_ratio. If it's consistently > 0.5, your lexicon has poor coverage for the input domain — consider expanding it or switching to an ML backend.Q6: What's the batch performance characteristic?
A:
detect_batchchunks inputs byconfig.batch_size(default 32) and runs each chunk withasyncio.gather:For the lexicon backend, this is CPU-bound (no I/O waits), so
asyncio.gatherdoesn't provide true parallelism — it's structured for the ML backend case where eachdetect()call may await GPU inference.Performance tip: For lexicon-only workloads with >10K texts, consider
concurrent.futures.ProcessPoolExecutorwrapping. The Protocol doesn't prescribe the execution strategy — only the async interface.Q7: Grafana dashboard for AffectDetector monitoring
A: Complete panel configuration:
Links: Issue #499 · Show & Tell #507 · EmotionModel #498 · Planning #497
Beta Was this translation helpful? Give feedback.
All reactions