The ULTIMATE Speech-to-Text Tool for Developers
No API. No Cloud. 100% FREE. Works Locally.
____ _ ____ _ _ _____
/ ___| _ __ ___ __ _| | _/ ___|| | _(_)_ __ |_ _| _ _ __ ___
\___ \| '_ \ / _ \/ _` | |/ \___ \| |/ / | '_ \ | || | | | '_ \ / _ \
___) | |_) | __/ (_| | < ___) | <| | |_) | | || |_| | |_) | __/
|____/| .__/ \___|\__,_|_|\_\____/|_|\_\_| .__/ |_| \__, | .__/ \___|
|_| |_| |___/|_|
🗣️ Speak → 📝 Text appears at cursor → Done!
Every developer has been stuck in this painful loop:
┌─────────────────────────────────────────────────────────────┐
│ 1. 🌐 Open browser/ChatGPT │
│ 2. 🎤 Click voice button │
│ 3. 🗣️ Speak │
│ 4. ⏳ Wait for transcription │
│ 5. 📋 Copy text │
│ 6. 🔄 Switch back to terminal │
│ 7. 📥 Paste │
└─────────────────────────────────────────────────────────────┘
⬇️ That's 7 STEPS just to avoid typing! ⬇️
┌─────────────────────────────────────────────────────────────┐
│ │
│ Press Alt+R → 🗣️ Speak → Press Alt+S → ✅ Done! │
│ │
└─────────────────────────────────────────────────────────────┘
⬆️ Just 2 STEPS with SpeakSkipType! ⬆️
git clone https://github.com/DandaAkhilReddy/Speakskiptype.git && cd Speakskiptype && pip install -r requirements.txt && python speakskiptype.pyNote: On first run, wait 2-15 minutes for the speech model to download (~40MB). Time depends on your internet speed. This only happens once!
- Press
Alt+Rto start recording - Speak your text
- Press
Alt+Sto stop & paste
| Shortcut | Action | Description |
|---|---|---|
Alt + R |
🔴 Start Record | Start recording (no terminal conflict!) |
Alt + S |
✅ Stop & Paste | Stop recording and paste text |
Alt + Q |
🚪 Quit | Exit application |
Ctrl + D |
🐛 Debug Mode | Toggle debug output |
Ctrl + H |
📜 History | Show transcription history |
| Feature | SpeakSkipType | Handy | OpenWhispr | voice_typing |
|---|---|---|---|---|
| 100% Free | ✅ | ✅ | ✅ | ✅ |
| 100% Offline | ✅ | ❌ | ✅ | ✅ |
| Multi-Engine (Vosk + Whisper) | ✅ | ❌ | ❌ | ❌ |
| Filler Word Removal | ✅ | ❌ | ❌ | ❌ |
| Continuous Listening | ✅ | ❌ | ❌ | ❌ |
| Voice Commands | ✅ | ✅ | ❌ | ❌ |
| Custom Vocabulary | ✅ | ❌ | ❌ | ❌ |
| Code Dictation Mode | ✅ | ❌ | ❌ | ❌ |
| Hold-to-Record | ✅ | ❌ | ❌ | ❌ |
| Auto-Punctuation | ✅ | ❌ | ❌ | ❌ |
| Export History | ✅ | ❌ | ❌ | ❌ |
┌────────────────────────────────────────────────────────────────────┐
│ 🎤 SPEAKSKIPTYPE │
├────────────────────────────────────────────────────────────────────┤
│ │
│ 🔧 ENGINES │ 🎛️ MODES │
│ ├─ Vosk (fast/light) │ ├─ Normal recording │
│ └─ Whisper (accurate) │ ├─ Hold-to-record │
│ │ ├─ Continuous listening │
│ 🧹 TEXT PROCESSING │ └─ Background/tray mode │
│ ├─ Filler word removal │ │
│ ├─ Auto-punctuation │ 🗣️ VOICE COMMANDS │
│ ├─ Custom vocabulary │ ├─ "new line" → ↵ │
│ └─ Code dictation mode │ ├─ "period" → . │
│ │ ├─ "delete that" → undo │
│ 📊 EXTRAS │ └─ "question mark" → ? │
│ ├─ Statistics (WPM) │ │
│ ├─ History export │ 🌍 MULTI-PLATFORM │
│ └─ Real-time display │ ├─ Windows ✅ │
│ │ ├─ macOS ✅ │
│ │ └─ Linux ✅ │
└────────────────────────────────────────────────────────────────────┘
- Python 3.7+
- Microphone
- ~40MB disk space
# 1. Clone the repo
git clone https://github.com/DandaAkhilReddy/Speakskiptype.git
cd Speakskiptype
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run!
python speakskiptype.py┌─────────────────────────────────────────────────────────────────────────┐
│ ⚠️ FIRST RUN: Please wait for model download! │
│ │
│ The speech recognition model (~40MB) downloads automatically. │
│ You'll see: "[*] Downloading speech model (first time only)..." │
│ │
│ ⏱️ Download time: 2-15 minutes (depends on internet speed) │
│ - Fast internet (50+ Mbps): ~2-3 minutes │
│ - Average internet (10-50 Mbps): ~5-8 minutes │
│ - Slow internet (<10 Mbps): ~10-15 minutes │
│ │
│ ✓ This only happens ONCE │
│ ✓ After download, the app starts instantly │
│ ✓ Model is saved to ~/.speakskiptype/ │
└─────────────────────────────────────────────────────────────────────────┘
What to expect on first run:
- App starts and shows "Downloading speech model..."
- Wait 2-15 minutes (depending on your internet speed)
- You'll see "Model downloaded!" when complete
- The main interface appears - you're ready to go!
If it seems stuck:
- The download IS happening in the background
- No progress bar is shown (this is normal)
- Just wait patiently - it WILL complete
- On slow connections, it can take up to 15 minutes
- Check
~/.speakskiptype/folder to see if model is downloading
Make sure you have:
- ✅ Python 3.7+ installed (
python --versionto check) - ✅ A working microphone
- ✅ Speakers/headphones (to hear beep feedback)
Step 1: Open Command Prompt
Win + R → type "cmd" → EnterStep 2: Navigate to the folder
cd C:\Users\YourUsername\SpeakskiptypeStep 3: Install dependencies
pip install vosk sounddevice pynput pyperclipStep 4: Run the app
python speakskiptype.pyStep 5: Test it!
┌────────────────────────────────────────────────────────────┐
│ 1. Open Notepad (or any text editor) │
│ 2. Click inside Notepad so cursor is there │
│ 3. Press Alt+R (you'll hear a beep - recording started) │
│ 4. Say: "Hello world this is a test" │
│ 5. Press Alt+S (you'll hear a beep - recording stopped) │
│ 6. Watch the text appear in Notepad! ✨ │
└────────────────────────────────────────────────────────────┘
Step 1: Open Terminal
Cmd + Space → type "Terminal" → EnterStep 2: Navigate and install
cd ~/Speakskiptype
pip3 install vosk sounddevice pynput pyperclipStep 3: Grant permissions
- Go to System Preferences → Security & Privacy → Privacy
- Enable Microphone access for Terminal
- Enable Accessibility access for Terminal (for hotkeys)
Step 4: Run and test
python3 speakskiptype.pyStep 1: Open Terminal
Ctrl + Alt + TStep 2: Install system dependencies
# Ubuntu/Debian
sudo apt install python3-pip portaudio19-dev
# Fedora
sudo dnf install python3-pip portaudio-develStep 3: Install Python packages
cd ~/Speakskiptype
pip3 install vosk sounddevice pynput pyperclipStep 4: Run and test
python3 speakskiptype.pyUse this checklist to verify everything works:
| # | Test | Expected Result | Status |
|---|---|---|---|
| 1 | Run python speakskiptype.py |
Banner appears with controls | ⬜ |
| 2 | Press Alt+R |
See [REC] Recording... + beep |
⬜ |
| 3 | Speak "hello world" | See Hearing: hello world... |
⬜ |
| 4 | Press Alt+S |
See [STOP] + [DONE] + beep |
⬜ |
| 5 | Check Notepad/editor | Text "hello world" appeared | ⬜ |
| 6 | Press Alt+Q |
App exits cleanly | ⬜ |
Say this:
"Um I would like to you know test this application"
Expected output:
"I would like to test this application"
The filler words (um, you know) are automatically removed! ✨
| Say This | Expected Output |
|---|---|
| "hello period" | hello. |
| "new line test" | hello test |
| "open paren test close paren" | (test) |
❌ "No module named 'vosk'"
pip install vosk❌ "No speech detected" every time
- Check microphone is not muted
- Check microphone permissions in system settings
- Try speaking louder/closer to mic
- Run:
python -c "import sounddevice; print(sounddevice.query_devices())"to see devices
❌ Text not appearing in editor
- Make sure the editor window is focused/active
- Try clicking in the editor right before pressing Alt+S
- Check if clipboard is working:
python -c "import pyperclip; pyperclip.copy('test'); print(pyperclip.paste())"
❌ Hotkeys not working (Windows)
Run Command Prompt as Administrator:
- Search "cmd" in Start Menu
- Right-click → "Run as administrator"
- Try again
python speakskiptype.pypython speakskiptype.py --whisperpython speakskiptype.py --continuouspython speakskiptype.py --codepython speakskiptype.py --bgpython speakskiptype.py --whisper --continuous --no-filler --codeNEW! Use voice input directly with Claude Code!
# Terminal 1: Start voice input helper
python claude_voice.py
# Terminal 2: Open Claude Code
claude| Key | Action |
|---|---|
F8 |
Toggle recording (start/stop + auto-paste) |
ESC |
Exit voice helper |
┌─────────────────────────────────────────────────────────────┐
│ 1. Run claude_voice.py (keep open) │
│ 2. Run claude in another terminal │
│ 3. Press F8 → Speak → Press F8 │
│ 4. Your speech appears in Claude Code! 🎉 │
└─────────────────────────────────────────────────────────────┘
NEW! Direct integration with Claude's API including proper support for Extended Thinking.
When using Claude's extended thinking feature via the Anthropic API, you might encounter this error:
Error: thinking blocks must be preserved exactly as received
This happens when:
- Thinking blocks are modified or removed from conversation history
- The order of content blocks is changed
- Follow-up messages don't include the original thinking blocks
We provide anthropic_client.py - a client that automatically handles extended thinking correctly.
pip install anthropicfrom anthropic_client import ClaudeClient
# Create client (uses ANTHROPIC_API_KEY env var)
client = ClaudeClient()
# First message - thinking blocks are generated internally
response = client.send_message("Explain quantum entanglement")
print(response)
# Follow-up - thinking blocks are automatically preserved!
response = client.send_message("Can you give a simpler analogy?")
print(response)
# Start fresh conversation (clears all history including thinking)
client.clear_conversation()from anthropic_client import ClaudeClient
client = ClaudeClient(
api_key="your-api-key", # Or set ANTHROPIC_API_KEY env var
model="claude-sonnet-4-20250514", # Model to use
max_tokens=16000, # Max response tokens
thinking_enabled=True, # Enable extended thinking
thinking_budget=10000 # Token budget for thinking
)
# Disable thinking for simpler queries
client.disable_thinking()
# Re-enable with custom budget
client.enable_thinking(budget_tokens=5000)The key fix is preserving response.content exactly as returned:
# CORRECT - Store complete response including thinking blocks
self._messages.append({
"role": "assistant",
"content": response.content # Includes ALL blocks unchanged
})
# WRONG - Don't filter or modify!
# content = [b for b in response.content if b.type == "text"] # BAD!| Rule | Description |
|---|---|
| Never modify | Keep thinking/redacted_thinking blocks exactly as received |
| Never remove | Include all blocks in conversation history |
| Keep order | Don't reorder content blocks |
| First position | Thinking block must be first in assistant's content array |
| Method | Description |
|---|---|
send_message(text) |
Send message, returns text response |
clear_conversation() |
Start fresh conversation |
get_conversation_history() |
Get full history with thinking blocks |
disable_thinking() |
Turn off extended thinking |
enable_thinking(budget) |
Turn on with token budget |
from anthropic_client import ClaudeClient
client = ClaudeClient(thinking_budget=8000)
# Turn 1 - Claude thinks through the problem
r1 = client.send_message("What's 15 * 23?")
print(f"Answer: {r1}")
# Turn 2 - Previous thinking is preserved automatically
r2 = client.send_message("Now multiply that by 2")
print(f"Answer: {r2}")
# Turn 3 - Full context maintained
r3 = client.send_message("What was my first question?")
print(f"Answer: {r3}")Speak these commands while recording:
| Say This | Get This |
|---|---|
| "new line" | ↵ (line break) |
| "new paragraph" | ↵↵ (double line break) |
| "period" | . |
| "comma" | , |
| "question mark" | ? |
| "exclamation mark" | ! |
| "open paren" | ( |
| "close paren" | ) |
| "delete that" | (removes last phrase) |
SpeakSkipType automatically removes filler words:
| You Say | You Get |
|---|---|
| "I um want to uh test this" | "I want to test this" |
| "So like you know it works" | "So it works" |
| "I mean basically it's done" | "it's done" |
Filler words removed: um, uh, er, ah, like, you know, i mean, sort of, kind of, basically, actually, literally, so yeah, right, okay so, well
| Flag | Description |
|---|---|
--whisper |
Use Whisper engine (more accurate) |
--continuous |
Continuous listening mode |
--code |
Code dictation mode |
--vad |
Voice Activity Detection |
--no-filler |
Remove filler words |
--literal |
Literal punctuation mode |
--bg |
Run in background |
--stats |
Show statistics |
--export |
Export history |
Edit the config to add custom word replacements:
"custom_vocabulary": {
"kubernetes": "K8s",
"javascript": "JavaScript",
"python": "Python"
}| Metric | Vosk | Whisper |
|---|---|---|
| Model Size | ~40MB | ~150MB |
| RAM Usage | ~200MB | ~500MB |
| Speed | ⚡ Fast | 🐢 Slower |
| Accuracy | 90% | 98% |
| Offline | ✅ Yes | ✅ Yes |
🔇 "No speech detected"
- Check microphone is connected and not muted
- Speak clearly at normal pace
- Check system microphone permissions
🎤 "Audio Error"
- Close other apps using microphone
- Try different audio input device
- Restart the application
🐧 Linux: Permission issues
sudo usermod -a -G audio $USER
# Log out and log back in🪟 Windows: Hotkeys not working
- Run terminal as Administrator
- Try different terminal (PowerShell, CMD)
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=speakskiptype
# 306 tests passing ✅Pull requests welcome! For major changes, open an issue first.
# Fork & clone
git clone https://github.com/YOUR_USERNAME/Speakskiptype.git
# Create feature branch
git checkout -b feature/amazing-feature
# Make changes & test
pytest tests/ -v
# Commit & push
git commit -m "Add amazing feature"
git push origin feature/amazing-feature
# Open Pull RequestMIT License - Use it, modify it, share it freely.
If SpeakSkipType saves you time, please give it a ⭐!
It helps others discover this tool.
Built with ❤️ by developers, for developers.
Stop copy-pasting. Start speaking.