Q&A: Phase 21.2 — AffectDetector lexicon design, negation handling & multimodal fusion #510

web3guru888 · 2026-04-13T12:13:48Z

web3guru888
Apr 13, 2026
Maintainer

❓ AffectDetector — Lexicon Design, Negation Handling & Multimodal Fusion

Common questions about the Phase 21.2 AffectDetector design — lexicon vs ML tradeoffs, negation window sizing, intensifier handling, multimodal fusion strategies, and operational tuning.

Issue: #499 | Show & Tell: #507 | Planning: #497

Q1: Why start with a lexicon-based approach instead of a pre-trained ML model?

A: Three reasons for lexicon-first:

Determinism — Rule-based detection produces identical results for identical inputs, making tests reliable and debugging straightforward. ML models introduce non-determinism via float precision and batching.
Zero dependencies — No torch, transformers, or model weights. The lexicon is a simple TSV file (~3KB for AFINN). This keeps the package lightweight and install fast.
Protocol boundary — The AffectDetector Protocol means ML backends are a drop-in replacement. When you need BERT-level accuracy, implement TransformerAffectDetector behind the same interface — all consumers (EmotionModel, MoodRegulator) work unchanged.

# Future ML backend — same Protocol, zero consumer changes:
class TransformerAffectDetector:
    async def detect(self, text: str) -> list[AffectSignal]: ...
    # ... same interface

Tradeoff: Lexicon misses sarcasm, context-dependent sentiment, and novel slang. The confidence field reflects this — lexicon signals typically have lower confidence than ML, and downstream consumers weight by confidence automatically.

Q2: Why a 3-token negation window? Why not sentence-wide?

A: The 3-token window is an empirical sweet spot from sentiment analysis literature (Taboada et al., 2011):

Window	Precision	Recall	F1	Problem
1 token	0.92	0.61	0.73	Misses "not very good"
3 tokens	0.88	0.84	0.86	Good balance
5 tokens	0.79	0.89	0.84	Over-negates: "not only good but great"
sentence	0.65	0.93	0.77	Flips everything after any negation

The 3-token window catches common patterns like "not good", "not very good", and "never really liked" while avoiding over-negation in compound sentences.

Configurable: If you need a different window, subclass LexiconAffectDetector and override _apply_negation() — the method is intentionally isolated.

Q3: How do intensifiers interact with negation?

A: Intensifiers are applied after negation, so they amplify the already-flipped score:

"not very good"
  1. lexicon: good → +3
  2. negation: good → -3 (within 3-token window of "not")
  3. intensifier: very → 1.5× → -3 × 1.5 = -4.5
  Result: strongly negative ✓

This matches linguistic intuition — "not very good" is more negative than "not good".

Order matters: If intensifiers ran first, "not very good" would be: +3 × 1.5 = +4.5 → negate → -4.5, which gives the same magnitude but the pipeline is cleaner with negation-first because the intensifier always amplifies the "semantic" polarity.

Q4: How does multimodal fusion handle missing modalities?

A: Gracefully — missing modalities are simply excluded from the weight normalisation:

def _fuse_modalities(self, signals_by_modality):
    present_weights = {
        m: self._config.modality_weights[m]
        for m in signals_by_modality
        if m in self._config.modality_weights
    }
    total_weight = sum(present_weights.values())
    if total_weight == 0:
        return []

    # weighted average over present modalities only
    v_fused = sum(
        present_weights[m] * self._aggregate_valence(sigs)
        for m, sigs in signals_by_modality.items()
    ) / total_weight
    # ... similar for arousal

Example: If only TEXT (w=0.50) and AUDIO (w=0.25) are present:

total_weight = 0.75
TEXT contributes 0.50/0.75 = 66.7%, AUDIO contributes 0.25/0.75 = 33.3%

This means text-only detection works identically to single-modality mode — no special-casing needed.

Q5: How should I tune the confidence threshold?

A: The default 0.3 filters out very low-coverage sentences (fewer than 15% of tokens found in lexicon). Tuning guidance:

Threshold	Behaviour	Use case
0.1	Keep almost everything	Exploratory analysis, recall-oriented
0.3	Default — balanced	General-purpose emotion detection
0.5	Moderate filtering	Production pipelines, precision-oriented
0.7	Strict — only high-coverage	Safety-critical, high-confidence-only

Diagnostic: Monitor affect_low_confidence_ratio. If it's consistently > 0.5, your lexicon has poor coverage for the input domain — consider expanding it or switching to an ML backend.

# Alert when more than 60% of signals are low-confidence
affect_low_confidence_ratio > 0.6

Q6: What's the batch performance characteristic?

A: detect_batch chunks inputs by config.batch_size (default 32) and runs each chunk with asyncio.gather:

async def detect_batch(self, texts: list[str]) -> list[list[AffectSignal]]:
    results = []
    for i in range(0, len(texts), self._config.batch_size):
        chunk = texts[i : i + self._config.batch_size]
        batch = await asyncio.gather(*(self.detect(t) for t in chunk))
        results.extend(batch)
    return results

For the lexicon backend, this is CPU-bound (no I/O waits), so asyncio.gather doesn't provide true parallelism — it's structured for the ML backend case where each detect() call may await GPU inference.

Performance tip: For lexicon-only workloads with >10K texts, consider concurrent.futures.ProcessPoolExecutor wrapping. The Protocol doesn't prescribe the execution strategy — only the async interface.

Texts	Lexicon (ms)	Projected ML (ms)
1	~0.1	~50
32	~3	~80 (batched)
1000	~100	~2500

Q7: Grafana dashboard for AffectDetector monitoring

A: Complete panel configuration:

# Valence & Arousal EWMA
- title: "Affect Valence & Arousal"
  type: timeseries
  targets:
    - expr: affect_average_valence
      legendFormat: "Valence (pleasure)"
    - expr: affect_average_arousal
      legendFormat: "Arousal (activation)"
  fieldConfig:
    defaults:
      min: -1
      max: 1
      thresholds:
        steps:
          - value: -0.5
            color: red
          - value: -0.05
            color: yellow
          - value: 0.05
            color: green

# Detection volume by polarity
- title: "Affect Detections by Polarity"
  type: barchart
  targets:
    - expr: sum(rate(affect_detections_total[5m])) by (polarity)
      legendFormat: "{{ polarity }}"

# Latency heatmap
- title: "Detection Latency"
  type: heatmap
  targets:
    - expr: rate(affect_detection_latency_seconds_bucket[5m])

# Confidence health
- title: "Low Confidence Ratio"
  type: stat
  targets:
    - expr: affect_low_confidence_ratio
  fieldConfig:
    defaults:
      thresholds:
        steps:
          - value: 0
            color: green
          - value: 0.4
            color: yellow
          - value: 0.6
            color: red

# Alert rules
groups:
  - name: affect_detector
    rules:
      - alert: HighLowConfidenceRatio
        expr: affect_low_confidence_ratio > 0.6
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "AffectDetector low confidence ratio above 60%"

      - alert: AffectDetectionLatencyHigh
        expr: histogram_quantile(0.95, rate(affect_detection_latency_seconds_bucket[5m])) > 1.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "AffectDetector p95 latency above 1s"

Links: Issue #499 · Show & Tell #507 · EmotionModel #498 · Planning #497

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q&A: Phase 21.2 — AffectDetector lexicon design, negation handling & multimodal fusion #510

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Q&A: Phase 21.2 — AffectDetector lexicon design, negation handling & multimodal fusion #510

Uh oh!

web3guru888 Apr 13, 2026 Maintainer

❓ AffectDetector — Lexicon Design, Negation Handling & Multimodal Fusion

Q1: Why start with a lexicon-based approach instead of a pre-trained ML model?

Q2: Why a 3-token negation window? Why not sentence-wide?

Q3: How do intensifiers interact with negation?

Q4: How does multimodal fusion handle missing modalities?

Q5: How should I tune the confidence threshold?

Q6: What's the batch performance characteristic?

Q7: Grafana dashboard for AffectDetector monitoring

Replies: 0 comments

web3guru888
Apr 13, 2026
Maintainer