Goal: Test langlistn end-to-end using tmux against a Korean YouTube video playing in Google Chrome. Evaluate the terminal UX — formatting, readability, visual glitches, timing of locked vs speculative text, status bar, paragraph breaks, header/footer. Identify and fix issues.
Definition of Done: The tool runs smoothly against live Korean audio, output is readable and well-formatted, no visual glitches (flickering, ghost lines, misaligned text), and the UX feels polished for a non-technical user watching subtitles.
- Working: Full pipeline — audio capture → Silero VAD → mlx-whisper large-v2 → LocalAgreement-2 → Claude translation (Bedrock) → terminal display
- Working: Interactive setup wizard, type-to-filter app picker, quick rerun, loading spinner, AWS auth error handling
- Untested since recent changes: The three UX branches (polish, status, loading) were built by subagents and merged but have NOT been manually tested yet. This is the first real test.
- Known cosmetic issues from prior testing:
- Paragraph breaks are mechanical (every 3 sentences) — may look odd with short sentences
- Display had prior flicker bugs that were supposedly fixed but need verification
- Characters occasionally missing at start of speculative text (was fixed, needs re-verification)
Project root: /Users/jakemc/projects/langlistn
Venv: .venv/bin/python
Entry point: .venv/bin/langlistn
Swift binary: swift/.build/release/AudioCaptureHelper
Config saved at: ~/.config/langlistn/last.json
Key source files:
| File | Role | Lines |
|---|---|---|
src/langlistn/__main__.py |
CLI + interactive wizard | 501 |
src/langlistn/pipeline.py |
Two-loop orchestrator | 556 |
src/langlistn/display.py |
Terminal renderer (two-zone) | 276 |
src/langlistn/translate.py |
Bedrock Claude translation | 400 |
src/langlistn/streaming_asr.py |
Whisper + LocalAgreement-2 | 246 |
src/langlistn/config.py |
Constants, language map | 56 |
# Install
cd ~/projects/langlistn && .venv/bin/pip install -e . -q
# Basic test (have a Korean YouTube video playing in Chrome)
.venv/bin/langlistn --app "Google Chrome" --source ko --debug-log /tmp/langlistn-debug.log
# With sonnet for better quality
.venv/bin/langlistn --app "Google Chrome" --source ko --translate-model sonnet --debug-log /tmp/langlistn-debug.log
# Interactive mode (tests wizard + quick rerun)
.venv/bin/langlistn
# Transcribe only (no AWS needed)
.venv/bin/langlistn --app "Google Chrome" --source ko --no-translate
# Check debug log for multi-hypothesis and LLM calls
tail -f /tmp/langlistn-debug.logFor tmux testing:
# Start a tmux session
tmux new -s langlistn-test
# Run the tool
cd ~/projects/langlistn && .venv/bin/langlistn --app "Google Chrome" --source ko --debug-log /tmp/langlistn-debug.log
# In another pane (Ctrl-b %), watch debug log
tail -f /tmp/langlistn-debug.log- macOS Apple Silicon (M1+)
- AWS credentials required for translation (run
aws sso loginfirst) - Bedrock model IDs: haiku=
global.anthropic.claude-haiku-4-5-20251001-v1:0, sonnet=global.anthropic.claude-sonnet-4-5-20250929-v1:0 - Terminal must have Screen Recording AND System Audio Recording permissions
- First run downloads whisper model (~3GB), subsequent runs start in seconds
- Fixed display bug where "starting audio capture" left stale erasable line count → subsequent display updates erased wrong lines. Fix: use raw
\rstdout instead ofdisplay.update()for pre-load messages. - Fixed app picker number selection using original indices instead of filtered indices. Fix: digits now select from current filtered list.
- Fixed misleading
→arrows in language/model picker implying keyboard navigation. Fix: replaced with*. - Added AWS SSO auth error handling — catches expired tokens, shows red error with
aws sso loginhint once, then skips all LLM calls (transcription continues). - Three UX branches built by subagents and merged:
- ux/polish: Quick rerun prompt, exit summary box, redesigned header
- ux/status: Simplified status bar (
🎧 3:45 · $0.13), silence feedback (⏸ waiting for speech...), speaker colour support - ux/loading: Animated braille spinner during model load, raw-mode app picker with in-place redraw
- Don't use
display.update()for one-off status messages during startup. It sets_erasable_lineswhich persists and corrupts the erase logic when the spinner or other output follows. - Don't ask the LLM to insert paragraph breaks. Tried adding "Insert a blank line between distinct topics" to the prompt — LLM ignores it because the reference translation (fed back) has no breaks, so it stays consistent with that. Paragraph breaks are now done client-side in
display.pyvia_add_paragraph_breaks()(every 3 sentences). - Whisper v3 hallucinates 4x more on Korean than v2. Don't switch to v3.
- The three subagent-built UX branches were merged and parse correctly, but have not been visually tested. There may be rendering issues from the merge (especially
pipeline.pywhich had a conflict resolved). - Paragraph breaks every 3 sentences may feel too frequent or too mechanical with real Korean content. The interval (
PARAGRAPH_INTERVAL = 3in display.py) may need tuning. - The speculative text rendering fix (word-by-word fitting on the locked line) hasn't been tested with the new status bar changes.
- Run the tool via tmux against a Korean YouTube video. Watch for:
- Does the header render correctly? (should show app name, language, Ctrl+C hint)
- Does the loading spinner animate and then show ✓?
- Does the status bar show
🎧 3:45 · $0.13(simplified) or the old verbose format? - Does
⏸ waiting for speech...appear during silence? - Are paragraph breaks in the right places?
- Is locked (bold) vs speculative (dim italic) visually clear?
- Any ghost lines, flickering, or misaligned text?
- Does Ctrl+C show a summary box?
- Check the debug log in a second pane — verify
Alt 1:/Alt 2:hypotheses appear in LLM prompts (multi-hypothesis feature). - Fix anything broken, focusing on display/formatting.
pipeline.pyis 556 lines — slightly over the 500 LOC preference. If modifying it, consider extracting the_build_status()function or the hallucination detection into a small utility module.- Display
_erasable_linestracking is fragile. Any code that writes to stdout outside ofdisplay.update()will desync the line counter. If you need to print anything, either go through the display or reset_erasable_linesafter. - The
_pick_appraw-mode code (tty.setraw) can leave the terminal in a bad state if an exception occurs betweensetrawand thefinallyblock. If terminal gets stuck,resetorstty sanefixes it. - Whisper timeout is 8s. If the buffer grows large (approaching 30s), whisper can take 3-5s. The hard cap is
buffer_trimming_sec + 10s = 35s. If you see timeouts in the log, the buffer is too large. - The
Spinnerclass usesasyncio.create_task— it must be started from within an async context. Don't try to use it from synchronous code.