Skip to content

JPyke3/whisper-dictate

Repository files navigation

Whisper Dictate

Tests Lint codecov Python 3.10+ License: GPL-3.0

Local speech-to-text dictation with a visual feedback interface using whisper.cpp.

Note: This project was created with the assistance of Claude Code, Anthropic's AI coding assistant.

Features

  • Privacy-first: All transcription happens locally on your machine
  • GPU acceleration: Supports CUDA for fast transcription via whisper.cpp-cuda
  • Visual feedback: Google Assistant-style animated dot visualizer
  • Toggle hotkey: Press once to start, press again to stop and transcribe
  • Multiple output modes: Copy to clipboard or type directly
  • Configurable: TOML-based configuration with CLI overrides

Demo

The visualizer shows animated dots that respond to your voice while recording:

    ●  ●  ●  ●     (dots animate based on audio level)
   Listening...

Requirements

System Dependencies

  • whisper.cpp - Speech recognition engine

    # Arch Linux (CPU only)
    paru -S whisper.cpp
    
    # Arch Linux (CUDA GPU acceleration)
    paru -S whisper.cpp-cuda
  • Clipboard tool (one of):

    • wl-copy (Wayland) - paru -S wl-clipboard
    • xclip (X11) - paru -S xclip
  • Typing tool (optional, for direct typing mode):

    • ydotool (Wayland) - paru -S ydotool
    • xdotool (X11) - paru -S xdotool

Python Dependencies

  • Python 3.10+
  • PyQt6
  • PyAudio
  • NumPy

Installation

From GitHub Releases (Quickest)

Download a pre-built binary from Releases:

# Download the latest binary
wget https://github.com/JPyke3/whisper-dictate/releases/latest/download/whisper-dictate-linux-x86_64
chmod +x whisper-dictate-linux-x86_64
sudo mv whisper-dictate-linux-x86_64 /usr/local/bin/whisper-dictate

Or extract from the tarball:

tar -xzf whisper-dictate-v*-linux-x86_64.tar.gz
sudo mv whisper-dictate /usr/local/bin/

Note: You still need to install whisper.cpp separately.

Using pipx (Recommended for Python users)

pipx installs the application in an isolated environment while making it globally available. This is the recommended method for most users.

# Install pipx if you don't have it
# Arch Linux
paru -S python-pipx

# Install whisper-dictate
pipx install git+https://github.com/JPyke3/whisper-dictate.git

To update to the latest version:

pipx upgrade whisper-dictate

Development Installation

For contributing or local development:

git clone https://github.com/JPyke3/whisper-dictate.git
cd whisper-dictate
pip install -e ".[dev]"

To install your local changes globally via pipx:

pipx install /path/to/whisper-dictate

To update after making changes:

pipx reinstall whisper-dictate

Setup

1. Download a Whisper Model

whisper-dictate --download-model small.en

Available models (English-only versions recommended for English speakers):

  • tiny.en (~75MB) - Fastest, lowest quality
  • base.en (~142MB) - Fast, decent quality
  • small.en (~488MB) - Good balance of speed and quality (recommended)
  • medium.en (~1.5GB) - High quality, slower
  • large (~3GB) - Highest quality, slowest

2. Create Configuration (Optional)

whisper-dictate --init-config

Edit ~/.config/whisper-dictate/config.toml:

[general]
output_mode = "clipboard"  # or "type"
language = "en"

[model]
name = "small.en"

[ui]
position = "bottom"  # or "top"
edge_margin = 60
theme = "google"  # or "blue", "purple", "mono"

3. Set Up Hotkey

Create a keyboard shortcut in your desktop environment to run:

whisper-dictate

KDE Plasma:

  1. System Settings > Shortcuts > Custom Shortcuts
  2. Add new Global Shortcut > Command/URL
  3. Set trigger (e.g., Alt+Space)
  4. Set action to whisper-dictate

Usage

Basic Usage

# Start/toggle recording (copies to clipboard)
whisper-dictate

# Type the transcription instead of copying
whisper-dictate --type

Commands

# Download a model
whisper-dictate --download-model small.en

# List downloaded models
whisper-dictate --list-models

# Show current configuration
whisper-dictate --show-config

# Create default configuration file
whisper-dictate --init-config

CLI Options

whisper-dictate [OPTIONS]

Options:
  --type              Type transcription instead of copying to clipboard
  -c, --config PATH   Path to configuration file
  -m, --model NAME    Override model (e.g., small.en, base.en)
  -p, --position POS  Window position: top or bottom
  -l, --language LANG Language code (e.g., en, de, fr)
  -v, --version       Show version
  -h, --help          Show help

How It Works

  1. Toggle on: Run whisper-dictate to start recording
  2. Speak: The dot visualizer animates based on your voice
  3. Toggle off: Run whisper-dictate again (or press Escape/Space)
  4. Transcription: whisper.cpp processes the audio
  5. Output: Text is copied to clipboard (and optionally typed)

The toggle mechanism uses a PID file and signals - when you run the command while already recording, it sends SIGUSR1 to the running instance to stop.

Troubleshooting

Direct typing not working (Wayland)

For ydotool on Wayland:

  1. Add your user to the input group: sudo usermod -aG input $USER
  2. Log out and back in
  3. Enable the ydotool service: systemctl --user enable --now ydotool.service

Transcription is slow

  • Use a smaller model (base.en or tiny.en)
  • Install whisper.cpp-cuda for GPU acceleration
  • Reduce thread count in config if CPU is overloaded

Window position issues on KDE Wayland

KDE Wayland may ignore window positioning requests. You can create a KWin rule:

  1. System Settings > Window Management > Window Rules
  2. Add rule matching window class whisper-dictate
  3. Set Position to Force with desired coordinates

License

GPL-3.0 - See LICENSE for details.

Acknowledgments

Speech Recognition

Python Libraries

  • PyQt6 by Riverbank Computing - Qt bindings for Python
  • PyAudio - Python bindings for PortAudio
  • PortAudio - Cross-platform audio I/O library
  • NumPy - Fundamental package for scientific computing
  • tomli - TOML parser for Python
  • tomli-w - TOML writer for Python

Development Tools

  • pytest - Testing framework
  • pytest-qt - pytest plugin for Qt application testing
  • pytest-cov - Coverage plugin for pytest
  • Black - The uncompromising code formatter
  • isort - Python import sorter
  • mypy - Static type checker for Python
  • pre-commit - Git hooks framework
  • setuptools - Python build system

System Utilities

Created With

About

Local speech-to-text dictation with visual feedback using whisper.cpp

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages