Whisper Dictate

Local speech-to-text dictation with a visual feedback interface using whisper.cpp.

Note: This project was created with the assistance of Claude Code, Anthropic's AI coding assistant.

Features

Privacy-first: All transcription happens locally on your machine
GPU acceleration: Supports CUDA for fast transcription via whisper.cpp-cuda
Visual feedback: Google Assistant-style animated dot visualizer
Toggle hotkey: Press once to start, press again to stop and transcribe
Multiple output modes: Copy to clipboard or type directly
Configurable: TOML-based configuration with CLI overrides

Demo

The visualizer shows animated dots that respond to your voice while recording:

    ●  ●  ●  ●     (dots animate based on audio level)
   Listening...

Requirements

System Dependencies

whisper.cpp - Speech recognition engine

# Arch Linux (CPU only)
paru -S whisper.cpp

# Arch Linux (CUDA GPU acceleration)
paru -S whisper.cpp-cuda

Clipboard tool (one of):
- wl-copy (Wayland) - paru -S wl-clipboard
- xclip (X11) - paru -S xclip
Typing tool (optional, for direct typing mode):
- ydotool (Wayland) - paru -S ydotool
- xdotool (X11) - paru -S xdotool

Python Dependencies

Python 3.10+
PyQt6
PyAudio
NumPy

Installation

From GitHub Releases (Quickest)

Download a pre-built binary from Releases:

# Download the latest binary
wget https://github.com/JPyke3/whisper-dictate/releases/latest/download/whisper-dictate-linux-x86_64
chmod +x whisper-dictate-linux-x86_64
sudo mv whisper-dictate-linux-x86_64 /usr/local/bin/whisper-dictate

Or extract from the tarball:

tar -xzf whisper-dictate-v*-linux-x86_64.tar.gz
sudo mv whisper-dictate /usr/local/bin/

Note: You still need to install whisper.cpp separately.

Using pipx (Recommended for Python users)

pipx installs the application in an isolated environment while making it globally available. This is the recommended method for most users.

# Install pipx if you don't have it
# Arch Linux
paru -S python-pipx

# Install whisper-dictate
pipx install git+https://github.com/JPyke3/whisper-dictate.git

To update to the latest version:

pipx upgrade whisper-dictate

Development Installation

For contributing or local development:

git clone https://github.com/JPyke3/whisper-dictate.git
cd whisper-dictate
pip install -e ".[dev]"

To install your local changes globally via pipx:

pipx install /path/to/whisper-dictate

To update after making changes:

pipx reinstall whisper-dictate

Setup

1. Download a Whisper Model

whisper-dictate --download-model small.en

Available models (English-only versions recommended for English speakers):

tiny.en (~75MB) - Fastest, lowest quality
base.en (~142MB) - Fast, decent quality
small.en (~488MB) - Good balance of speed and quality (recommended)
medium.en (~1.5GB) - High quality, slower
large (~3GB) - Highest quality, slowest

2. Create Configuration (Optional)

whisper-dictate --init-config

Edit ~/.config/whisper-dictate/config.toml:

[general]
output_mode = "clipboard"  # or "type"
language = "en"

[model]
name = "small.en"

[ui]
position = "bottom"  # or "top"
edge_margin = 60
theme = "google"  # or "blue", "purple", "mono"

3. Set Up Hotkey

Create a keyboard shortcut in your desktop environment to run:

whisper-dictate

KDE Plasma:

System Settings > Shortcuts > Custom Shortcuts
Add new Global Shortcut > Command/URL
Set trigger (e.g., Alt+Space)
Set action to whisper-dictate

Usage

Basic Usage

# Start/toggle recording (copies to clipboard)
whisper-dictate

# Type the transcription instead of copying
whisper-dictate --type

Commands

# Download a model
whisper-dictate --download-model small.en

# List downloaded models
whisper-dictate --list-models

# Show current configuration
whisper-dictate --show-config

# Create default configuration file
whisper-dictate --init-config

CLI Options

whisper-dictate [OPTIONS]

Options:
  --type              Type transcription instead of copying to clipboard
  -c, --config PATH   Path to configuration file
  -m, --model NAME    Override model (e.g., small.en, base.en)
  -p, --position POS  Window position: top or bottom
  -l, --language LANG Language code (e.g., en, de, fr)
  -v, --version       Show version
  -h, --help          Show help

How It Works

Toggle on: Run whisper-dictate to start recording
Speak: The dot visualizer animates based on your voice
Toggle off: Run whisper-dictate again (or press Escape/Space)
Transcription: whisper.cpp processes the audio
Output: Text is copied to clipboard (and optionally typed)

The toggle mechanism uses a PID file and signals - when you run the command while already recording, it sends SIGUSR1 to the running instance to stop.

Troubleshooting

Direct typing not working (Wayland)

For ydotool on Wayland:

Add your user to the input group: sudo usermod -aG input $USER
Log out and back in
Enable the ydotool service: systemctl --user enable --now ydotool.service

Transcription is slow

Use a smaller model (base.en or tiny.en)
Install whisper.cpp-cuda for GPU acceleration
Reduce thread count in config if CPU is overloaded

Window position issues on KDE Wayland

KDE Wayland may ignore window positioning requests. You can create a KWin rule:

System Settings > Window Management > Window Rules
Add rule matching window class whisper-dictate
Set Position to Force with desired coordinates

License

GPL-3.0 - See LICENSE for details.

Acknowledgments

Speech Recognition

whisper.cpp by Georgi Gerganov - High-performance C/C++ inference
OpenAI Whisper - The original speech recognition model

Python Libraries

PyQt6 by Riverbank Computing - Qt bindings for Python
PyAudio - Python bindings for PortAudio
PortAudio - Cross-platform audio I/O library
NumPy - Fundamental package for scientific computing
tomli - TOML parser for Python
tomli-w - TOML writer for Python

Development Tools

pytest - Testing framework
pytest-qt - pytest plugin for Qt application testing
pytest-cov - Coverage plugin for pytest
Black - The uncompromising code formatter
isort - Python import sorter
mypy - Static type checker for Python
pre-commit - Git hooks framework
setuptools - Python build system

System Utilities

wl-clipboard - Wayland clipboard utilities
xclip - X11 clipboard interface
ydotool - Generic Linux automation tool
xdotool - X11 automation tool

Created With

Claude Code by Anthropic

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
tests		tests
whisper_dictate		whisper_dictate
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Whisper Dictate

Features

Demo

Requirements

System Dependencies

Python Dependencies

Installation

From GitHub Releases (Quickest)

Using pipx (Recommended for Python users)

Development Installation

Setup

1. Download a Whisper Model

2. Create Configuration (Optional)

3. Set Up Hotkey

Usage

Basic Usage

Commands

CLI Options

How It Works

Troubleshooting

Direct typing not working (Wayland)

Transcription is slow

Window position issues on KDE Wayland

License

Acknowledgments

Speech Recognition

Python Libraries

Development Tools

System Utilities

Created With

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages