Categories: New | Enhancement | Fix | Breaking | Migration | Architecture
Rewrote classify_benchmark.py with --mode flag: heuristic (default, fast, no LLM), full (heuristic+LLM+guards), ensemble (all voters + confidence scoring). Added confidence band reporting for ensemble mode. Backward-compatible --llm flag preserved.