Local speech-to-text dictation with a visual feedback interface using whisper.cpp.
Note: This project was created with the assistance of Claude Code, Anthropic's AI coding assistant.
- Privacy-first: All transcription happens locally on your machine
- GPU acceleration: Supports CUDA for fast transcription via whisper.cpp-cuda
- Visual feedback: Google Assistant-style animated dot visualizer
- Toggle hotkey: Press once to start, press again to stop and transcribe
- Multiple output modes: Copy to clipboard or type directly
- Configurable: TOML-based configuration with CLI overrides
The visualizer shows animated dots that respond to your voice while recording:
● ● ● ● (dots animate based on audio level)
Listening...
-
whisper.cpp - Speech recognition engine
# Arch Linux (CPU only) paru -S whisper.cpp # Arch Linux (CUDA GPU acceleration) paru -S whisper.cpp-cuda
-
Clipboard tool (one of):
wl-copy(Wayland) -paru -S wl-clipboardxclip(X11) -paru -S xclip
-
Typing tool (optional, for direct typing mode):
ydotool(Wayland) -paru -S ydotoolxdotool(X11) -paru -S xdotool
- Python 3.10+
- PyQt6
- PyAudio
- NumPy
Download a pre-built binary from Releases:
# Download the latest binary
wget https://github.com/JPyke3/whisper-dictate/releases/latest/download/whisper-dictate-linux-x86_64
chmod +x whisper-dictate-linux-x86_64
sudo mv whisper-dictate-linux-x86_64 /usr/local/bin/whisper-dictateOr extract from the tarball:
tar -xzf whisper-dictate-v*-linux-x86_64.tar.gz
sudo mv whisper-dictate /usr/local/bin/Note: You still need to install whisper.cpp separately.
pipx installs the application in an isolated environment while making it globally available. This is the recommended method for most users.
# Install pipx if you don't have it
# Arch Linux
paru -S python-pipx
# Install whisper-dictate
pipx install git+https://github.com/JPyke3/whisper-dictate.gitTo update to the latest version:
pipx upgrade whisper-dictateFor contributing or local development:
git clone https://github.com/JPyke3/whisper-dictate.git
cd whisper-dictate
pip install -e ".[dev]"To install your local changes globally via pipx:
pipx install /path/to/whisper-dictateTo update after making changes:
pipx reinstall whisper-dictatewhisper-dictate --download-model small.enAvailable models (English-only versions recommended for English speakers):
tiny.en(~75MB) - Fastest, lowest qualitybase.en(~142MB) - Fast, decent qualitysmall.en(~488MB) - Good balance of speed and quality (recommended)medium.en(~1.5GB) - High quality, slowerlarge(~3GB) - Highest quality, slowest
whisper-dictate --init-configEdit ~/.config/whisper-dictate/config.toml:
[general]
output_mode = "clipboard" # or "type"
language = "en"
[model]
name = "small.en"
[ui]
position = "bottom" # or "top"
edge_margin = 60
theme = "google" # or "blue", "purple", "mono"Create a keyboard shortcut in your desktop environment to run:
whisper-dictateKDE Plasma:
- System Settings > Shortcuts > Custom Shortcuts
- Add new Global Shortcut > Command/URL
- Set trigger (e.g., Alt+Space)
- Set action to
whisper-dictate
# Start/toggle recording (copies to clipboard)
whisper-dictate
# Type the transcription instead of copying
whisper-dictate --type# Download a model
whisper-dictate --download-model small.en
# List downloaded models
whisper-dictate --list-models
# Show current configuration
whisper-dictate --show-config
# Create default configuration file
whisper-dictate --init-configwhisper-dictate [OPTIONS]
Options:
--type Type transcription instead of copying to clipboard
-c, --config PATH Path to configuration file
-m, --model NAME Override model (e.g., small.en, base.en)
-p, --position POS Window position: top or bottom
-l, --language LANG Language code (e.g., en, de, fr)
-v, --version Show version
-h, --help Show help
- Toggle on: Run
whisper-dictateto start recording - Speak: The dot visualizer animates based on your voice
- Toggle off: Run
whisper-dictateagain (or press Escape/Space) - Transcription: whisper.cpp processes the audio
- Output: Text is copied to clipboard (and optionally typed)
The toggle mechanism uses a PID file and signals - when you run the command while already recording, it sends SIGUSR1 to the running instance to stop.
For ydotool on Wayland:
- Add your user to the
inputgroup:sudo usermod -aG input $USER - Log out and back in
- Enable the ydotool service:
systemctl --user enable --now ydotool.service
- Use a smaller model (
base.enortiny.en) - Install
whisper.cpp-cudafor GPU acceleration - Reduce thread count in config if CPU is overloaded
KDE Wayland may ignore window positioning requests. You can create a KWin rule:
- System Settings > Window Management > Window Rules
- Add rule matching window class
whisper-dictate - Set Position to Force with desired coordinates
GPL-3.0 - See LICENSE for details.
- whisper.cpp by Georgi Gerganov - High-performance C/C++ inference
- OpenAI Whisper - The original speech recognition model
- PyQt6 by Riverbank Computing - Qt bindings for Python
- PyAudio - Python bindings for PortAudio
- PortAudio - Cross-platform audio I/O library
- NumPy - Fundamental package for scientific computing
- tomli - TOML parser for Python
- tomli-w - TOML writer for Python
- pytest - Testing framework
- pytest-qt - pytest plugin for Qt application testing
- pytest-cov - Coverage plugin for pytest
- Black - The uncompromising code formatter
- isort - Python import sorter
- mypy - Static type checker for Python
- pre-commit - Git hooks framework
- setuptools - Python build system
- wl-clipboard - Wayland clipboard utilities
- xclip - X11 clipboard interface
- ydotool - Generic Linux automation tool
- xdotool - X11 automation tool
- Claude Code by Anthropic