AI-powered voice synthesis platform delivering both Text-to-Speech (TTS) and Speech-to-Text (STT) through a high-performance REST API with voice cloning capabilities.
- OpenAI-Compatible API — Drop-in replacement for OpenAI's TTS API
- Voice Cloning — Upload voice samples for personalized speech synthesis
- Voice Library — Store, manage, and reuse custom voices by name
- Real-time Streaming — Raw audio streaming and SSE (Server-Side Events) support
- Smart Text Processing — Automatic chunking for long-form text
- React Web UI — Optional frontend with dark/light mode for interactive use
- Docker Ready — Full containerization with GPU, CPU, and uv-optimized variants
- Memory Management — Automatic cleanup, CUDA cache clearing, and real-time monitoring
- Configurable Parameters — Fine-tune exaggeration, pace, temperature per request
app/ # FastAPI backend
├── config.py # Environment-based configuration
├── main.py # FastAPI application entry
├── models/ # Pydantic request/response models
│ ├── requests.py
│ └── responses.py
├── core/ # Core engine
│ ├── tts_model.py # Chatterbox TTS model management
│ ├── text_processing.py # Chunking and preprocessing
│ └── memory.py # Memory monitoring and cleanup
└── api/ # Endpoint modules
└── endpoints/
├── speech.py # TTS generation + streaming
├── health.py # Health checks
├── models.py # Model listing (OpenAI compat)
├── memory.py # Memory management API
└── config.py # Runtime configuration
frontend/ # React web interface
docker/ # Deployment configs (GPU/CPU/uv variants)
tests/ # API and memory test suites
# Clone and enter
git clone https://github.com/Sid-V5/EchoSynth.git
cd EchoSynth/chatterbox-tts-api
# Option A: uv (recommended)
uv sync
cp .env.example .env
uv run main.py
# Option B: pip
python -m venv .venv && .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
python main.pyAPI docs auto-generated at http://localhost:4123/docs
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello from EchoSynth!"}' \
--output speech.wavcurl -X POST http://localhost:4123/v1/audio/speech/upload \
-F "input=Hello with my voice!" \
-F "voice_file=@my_voice.mp3" \
--output cloned.wav# Save a voice
curl -X POST http://localhost:4123/v1/voices \
-F "voice_file=@my_voice.wav" \
-F "name=custom-voice"
# Use it by name
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Reusable voice!", "voice": "custom-voice"}' \
--output output.wav# Raw audio stream
curl -X POST http://localhost:4123/v1/audio/speech/stream \
-H "Content-Type: application/json" \
-d '{"input": "Streams in real-time!"}' \
--output stream.wav
# SSE format (OpenAI compatible)
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"input": "SSE streaming!", "stream_format": "sse"}' \
--no-buffer| Endpoint | Method | Description |
|---|---|---|
/v1/audio/speech |
POST | Generate speech (complete or SSE stream) |
/v1/audio/speech/upload |
POST | Generate speech with voice upload |
/v1/audio/speech/stream |
POST | Stream audio generation |
/v1/voices |
GET/POST | List or upload custom voices |
/v1/models |
GET | Available models (OpenAI compat) |
/health |
GET | Health check and status |
/status/progress |
GET | Real-time generation progress |
/memory |
GET | Memory usage and cleanup |
/docs |
GET | Interactive API documentation |
| Parameter | Range | Default | Effect |
|---|---|---|---|
exaggeration |
0.25 – 2.0 | 0.5 | Emotion intensity |
cfg_weight |
0.0 – 1.0 | 0.5 | Pace control |
temperature |
0.05 – 5.0 | 0.8 | Sampling randomness |
cd chatterbox-tts-api
# Standard
docker compose -f docker/docker-compose.yml up -d
# GPU-accelerated
docker compose -f docker/docker-compose.gpu.yml up -d
# With React frontend
docker compose -f docker/docker-compose.yml --profile frontend up -d| Format | Max Size | Recommended Duration |
|---|---|---|
| MP3, WAV, FLAC, M4A, OGG | 10 MB | 10–30s of clear speech |
- API Framework: FastAPI + Uvicorn
- TTS Engine: Chatterbox TTS (Resemble AI)
- ML Runtime: PyTorch 2.x + torchaudio
- STT: HuggingFace Transformers (Whisper-based)
- Frontend: React + Vite
- Validation: Pydantic v2
- Containerization: Docker with GPU/CPU/uv variants
MIT