This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Run all tests
go test ./...
# Run tests with verbose output
go test -v ./...
# Run a specific test
go test -v -run TestDiffStrings ./...
# Run CLI tests only
go test -v ./cmd/tokendiff/...
# Build the CLI
go build ./cmd/tokendiff
# Install the CLI globally
go install ./cmd/tokendiff
# Run benchmarks
go test -bench=. ./...tokendiff is a Go library and CLI for token-level diffing with delimiter support. It uses the histogram diff algorithm via diffx.
-
Tokenization (
tokenize.go): Splits text into tokens using configurable delimiters and whitespace. Key functions:Tokenize()- basic tokenizationTokenizeWithPositions()- tracks byte offsets for whitespace reconstruction
-
Diffing (
tokendiff.go): Computes diffs using diffx's histogram algorithm. Core types:Operation- Equal, Insert, DeleteDiff- single operation on a tokenOptions- configures delimiters, whitespace handling, case sensitivity
-
Post-processing (
postprocess.go): Transforms raw diffs:AggregateDiffs()- combines adjacent same-type operationsApplyMatchContext()- converts isolated equals to delete+insert pairsShiftBoundaries()- improves diff readability
-
Formatting (
format.go): Renders diffs as text with markers, colors, or overstrike.FormatOptionscontrols output style. -
Line-level diffing (
linediff.go): Pairs deleted/inserted lines for line-mode output using positional or similarity matching. -
Unified diff parsing (
unified.go): Parsesdiff -u/git diffoutput for--diff-inputmode.
The CLI (cmd/tokendiff/main.go) wraps the library with:
- Flag parsing via spf13/pflag
- Configuration file support (
~/.tokendiffrc,~/.config/tokendiff/config, profiles) - Stdin input support
- Exit codes: 0 (identical), 1 (differ), 2 (error)
- Default delimiters are empty (whitespace-only splitting) to match dwdiff behavior
- Case-insensitive mode compares lowercase but preserves original case in output
- The histogram algorithm avoids spurious matches on common words like "the", "for", "in"
DiffResultincludes token positions to reconstruct original whitespace for Equal tokens