|
Your audio never leaves your computer. Everything runs locally using OpenAI's Whisper model - no cloud uploads, no API keys needed, no subscription costs. |
CPU-optimized transcription with |
|
Get intelligent summaries using Ollama's local LLM. Choose from concise, detailed, or bullet-point formats - all without API costs. |
Export your transcripts as TXT, SRT, VTT, or JSON. Perfect for subtitles, documentation, or further processing. |
| Feature | Description |
|---|---|
| ๐ต Multi-Format Support | MP3, WAV, M4A, FLAC, OGG, WEBM, MP4 |
| ๐ Real-Time Progress | Watch transcription progress live |
| ๐ Timestamps | Every segment includes precise timing |
| ๐ Multi-Language | Automatic language detection |
| ๐ฑ Responsive UI | Beautiful interface on any device |
| ๐ No Size Limits | Upload audio files of any length |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AUDTEXT โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ โ โ โ โ โ โ
โ โ Frontend โโโโโโถโ Backend โโโโโโถโ Whisper โ โ
โ โ React 18 โ โ FastAPI โ โ (Local) โ โ
โ โ โ โ โ โ โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โ Ollama โ โ
โ โ (LLM) โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Requirement | Version | Installation |
|---|---|---|
| Python | 3.11+ | python.org |
| Node.js | 18+ | nodejs.org |
| FFmpeg | Latest | See below |
| Ollama | Latest | ollama.ai |
๐ฆ Install FFmpeg
# Windows (winget)
winget install ffmpeg
# Windows (chocolatey)
choco install ffmpeg
# macOS
brew install ffmpeg
# Linux (Ubuntu/Debian)
sudo apt install ffmpeg# 1๏ธโฃ Clone & Setup Backend
git clone https://github.com/DandaAkhilReddy/Audtext.git
cd Audtext/backend
python -m venv venv && .\venv\Scripts\activate # Windows
pip install -r requirements.txt
# 2๏ธโฃ Setup Frontend
cd ../frontend
npm install
# 3๏ธโฃ Download AI Model
ollama pull llama3.1:8bOpen 3 terminals:
# Terminal 1 - AI Engine
ollama serve
# Terminal 2 - Backend (activate venv first!)
cd Audtext/backend && .\venv\Scripts\activate
uvicorn main:app --reload --port 8000
# Terminal 3 - Frontend
cd Audtext/frontend
npm run dev๐ Open โ http://localhost:5173
๐ง Detailed Backend Setup
cd backend
# Create virtual environment
python -m venv venv
# Activate it
# Windows:
.\venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtDependencies include:
fastapi- Modern web frameworkfaster-whisper- Optimized speech recognitionhttpx- Async HTTP client for Ollamapydantic- Data validation
๐จ Detailed Frontend Setup
cd frontend
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run buildBuilt with:
React 18- UI frameworkVite- Lightning fast bundlerTailwind CSS- Utility-first stylingLucide React- Beautiful icons
Edit backend/core/config.py:
WHISPER_MODEL: str = "base" # Options: tiny, base, small, medium, large| Model | RAM | Speed (1hr audio) | Quality |
|---|---|---|---|
tiny |
1GB | ~5 min | โญโญ |
base |
1.5GB | ~10 min | โญโญโญ |
small |
2.5GB | ~20 min | โญโญโญโญ |
medium |
5GB | ~40 min | โญโญโญโญโญ |
OLLAMA_MODEL: str = "llama3.1:8b" # Or any Ollama model| Endpoint | Method | Description |
|---|---|---|
/api/upload |
POST |
Upload audio file |
/api/status/{task_id} |
GET |
Get transcription progress |
/api/result/{task_id} |
GET |
Get full transcript |
/api/summarize |
POST |
Generate AI summary |
/api/export/{format}/{task_id} |
GET |
Export (txt/srt/vtt/json) |
/api/ollama/health |
GET |
Check Ollama status |
๐ Interactive Docs โ http://localhost:8000/docs
Audtext/
โโโ ๐ backend/
โ โโโ main.py # FastAPI entry point
โ โโโ requirements.txt # Python dependencies
โ โโโ api/routes/ # API endpoints
โ โโโ services/ # Business logic
โ โ โโโ whisper_service.py # Transcription
โ โ โโโ ollama_service.py # Summarization
โ โโโ core/config.py # Settings
โ โโโ tests/ # Test suite
โ
โโโ โ๏ธ frontend/
โ โโโ src/
โ โ โโโ App.tsx # Main component
โ โ โโโ components/ # UI components
โ โ โโโ services/api.ts # API client
โ โโโ package.json
โ
โโโ ๐ uploads/ # Temporary storage
โ "Failed to fetch" on upload
Make sure the backend is running on port 8000:
uvicorn main:app --reload --port 8000โ Summary returns 500 error
- Ensure Ollama is running:
ollama serve - Download the model:
ollama pull llama3.1:8b - Verify:
curl http://localhost:11434/api/tags
โ First transcription is slow
The first run downloads the Whisper model (~150MB for base). Subsequent runs are faster.
Contributions are welcome! Here's how you can help:
- ๐ด Fork the repository
- ๐ฟ Create a feature branch (
git checkout -b feature/amazing) - ๐พ Commit your changes (
git commit -m 'Add amazing feature') - ๐ค Push to the branch (
git push origin feature/amazing) - ๐ Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
| Technology | Purpose |
|---|---|
| ๐ค OpenAI Whisper | Speech Recognition |
| โก faster-whisper | Optimized Inference |
| ๐ฆ Ollama | Local LLM Runtime |
| ๐ FastAPI | Backend Framework |
| โ๏ธ React | Frontend Framework |
