🎙️ Audtext

Transform Audio into Text & Insights — 100% Local, 100% Private

🚀 Quick Start • ✨ Features • 📖 Documentation • 🤝 Contributing

🌟 Why Audtext?

🔒 Privacy First Your audio never leaves your computer. Everything runs locally using OpenAI's Whisper model - no cloud uploads, no API keys needed, no subscription costs.	⚡ Lightning Fast CPU-optimized transcription with `faster-whisper`. Process 1-hour audio files in minutes, not hours. Real-time progress tracking included.
🤖 AI-Powered Summaries Get intelligent summaries using Ollama's local LLM. Choose from concise, detailed, or bullet-point formats - all without API costs.	📤 Multiple Export Formats Export your transcripts as TXT, SRT, VTT, or JSON. Perfect for subtitles, documentation, or further processing.

✨ Features

Feature	Description
🎵 Multi-Format Support	MP3, WAV, M4A, FLAC, OGG, WEBM, MP4
📊 Real-Time Progress	Watch transcription progress live
🕐 Timestamps	Every segment includes precise timing
🌍 Multi-Language	Automatic language detection
📱 Responsive UI	Beautiful interface on any device
🔄 No Size Limits	Upload audio files of any length

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         AUDTEXT                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   ┌──────────────┐     ┌──────────────┐     ┌──────────────┐   │
│   │              │     │              │     │              │   │
│   │   Frontend   │────▶│   Backend    │────▶│   Whisper    │   │
│   │   React 18   │     │   FastAPI    │     │   (Local)    │   │
│   │              │     │              │     │              │   │
│   └──────────────┘     └──────┬───────┘     └──────────────┘   │
│                               │                                  │
│                               ▼                                  │
│                        ┌──────────────┐                         │
│                        │              │                         │
│                        │   Ollama     │                         │
│                        │   (LLM)      │                         │
│                        │              │                         │
│                        └──────────────┘                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

Requirement	Version	Installation
Python	3.11+	python.org
Node.js	18+	nodejs.org
FFmpeg	Latest	See below
Ollama	Latest	ollama.ai

📦 Install FFmpeg

# Windows (winget)
winget install ffmpeg

# Windows (chocolatey)
choco install ffmpeg

# macOS
brew install ffmpeg

# Linux (Ubuntu/Debian)
sudo apt install ffmpeg

⚡ 3-Step Setup

# 1️⃣ Clone & Setup Backend
git clone https://github.com/DandaAkhilReddy/Audtext.git
cd Audtext/backend
python -m venv venv && .\venv\Scripts\activate  # Windows
pip install -r requirements.txt

# 2️⃣ Setup Frontend
cd ../frontend
npm install

# 3️⃣ Download AI Model
ollama pull llama3.1:8b

🎬 Run the App

Open 3 terminals:

# Terminal 1 - AI Engine
ollama serve

# Terminal 2 - Backend (activate venv first!)
cd Audtext/backend && .\venv\Scripts\activate
uvicorn main:app --reload --port 8000

# Terminal 3 - Frontend
cd Audtext/frontend
npm run dev

🌐 Open → http://localhost:5173

📖 Installation

🔧 Detailed Backend Setup

cd backend

# Create virtual environment
python -m venv venv

# Activate it
# Windows:
.\venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Dependencies include:

fastapi - Modern web framework
faster-whisper - Optimized speech recognition
httpx - Async HTTP client for Ollama
pydantic - Data validation

🎨 Detailed Frontend Setup

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

Built with:

React 18 - UI framework
Vite - Lightning fast bundler
Tailwind CSS - Utility-first styling
Lucide React - Beautiful icons

⚙️ Configuration

🎤 Whisper Models

Edit backend/core/config.py:

WHISPER_MODEL: str = "base"  # Options: tiny, base, small, medium, large

Model	RAM	Speed (1hr audio)	Quality
`tiny`	1GB	~5 min	⭐⭐
`base`	1.5GB	~10 min	⭐⭐⭐
`small`	2.5GB	~20 min	⭐⭐⭐⭐
`medium`	5GB	~40 min	⭐⭐⭐⭐⭐

🤖 Ollama Models

OLLAMA_MODEL: str = "llama3.1:8b"  # Or any Ollama model

🔌 API Reference

Endpoint	Method	Description
`/api/upload`	`POST`	Upload audio file
`/api/status/{task_id}`	`GET`	Get transcription progress
`/api/result/{task_id}`	`GET`	Get full transcript
`/api/summarize`	`POST`	Generate AI summary
`/api/export/{format}/{task_id}`	`GET`	Export (txt/srt/vtt/json)
`/api/ollama/health`	`GET`	Check Ollama status

📚 Interactive Docs → http://localhost:8000/docs

📁 Project Structure

Audtext/
├── 🐍 backend/
│   ├── main.py              # FastAPI entry point
│   ├── requirements.txt     # Python dependencies
│   ├── api/routes/          # API endpoints
│   ├── services/            # Business logic
│   │   ├── whisper_service.py   # Transcription
│   │   └── ollama_service.py    # Summarization
│   ├── core/config.py       # Settings
│   └── tests/               # Test suite
│
├── ⚛️ frontend/
│   ├── src/
│   │   ├── App.tsx          # Main component
│   │   ├── components/      # UI components
│   │   └── services/api.ts  # API client
│   └── package.json
│
└── 📂 uploads/              # Temporary storage

🐛 Troubleshooting

❌ "Failed to fetch" on upload

Make sure the backend is running on port 8000:

uvicorn main:app --reload --port 8000

❌ Summary returns 500 error

Ensure Ollama is running: ollama serve
Download the model: ollama pull llama3.1:8b
Verify: curl http://localhost:11434/api/tags

❌ First transcription is slow

The first run downloads the Whisper model (~150MB for base). Subsequent runs are faster.

🤝 Contributing

Contributions are welcome! Here's how you can help:

🍴 Fork the repository
🌿 Create a feature branch (git checkout -b feature/amazing)
💾 Commit your changes (git commit -m 'Add amazing feature')
📤 Push to the branch (git push origin feature/amazing)
🔃 Open a Pull Request

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Technology	Purpose
🎤 OpenAI Whisper	Speech Recognition
⚡ faster-whisper	Optimized Inference
🦙 Ollama	Local LLM Runtime
🚀 FastAPI	Backend Framework
⚛️ React	Frontend Framework

⭐ Star this repo if you find it useful!

Made with ❤️ by Akhil Reddy

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
uploads		uploads
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Audtext

Transform Audio into Text & Insights — 100% Local, 100% Private

🌟 Why Audtext?

🔒 Privacy First

⚡ Lightning Fast

🤖 AI-Powered Summaries

📤 Multiple Export Formats

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

⚡ 3-Step Setup

🎬 Run the App

📖 Installation

⚙️ Configuration

🎤 Whisper Models

🤖 Ollama Models

🔌 API Reference

📁 Project Structure

🐛 Troubleshooting

🤝 Contributing

📜 License

🙏 Acknowledgments

⭐ Star this repo if you find it useful!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Audtext

Transform Audio into Text & Insights — 100% Local, 100% Private

🌟 Why Audtext?

🔒 Privacy First

⚡ Lightning Fast

🤖 AI-Powered Summaries

📤 Multiple Export Formats

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

⚡ 3-Step Setup

🎬 Run the App

📖 Installation

⚙️ Configuration

🎤 Whisper Models

🤖 Ollama Models

🔌 API Reference

📁 Project Structure

🐛 Troubleshooting

🤝 Contributing

📜 License

🙏 Acknowledgments

⭐ Star this repo if you find it useful!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages