EchoSynth

AI-powered voice synthesis platform delivering both Text-to-Speech (TTS) and Speech-to-Text (STT) through a high-performance REST API with voice cloning capabilities.

Features

OpenAI-Compatible API — Drop-in replacement for OpenAI's TTS API
Voice Cloning — Upload voice samples for personalized speech synthesis
Voice Library — Store, manage, and reuse custom voices by name
Real-time Streaming — Raw audio streaming and SSE (Server-Side Events) support
Smart Text Processing — Automatic chunking for long-form text
React Web UI — Optional frontend with dark/light mode for interactive use
Docker Ready — Full containerization with GPU, CPU, and uv-optimized variants
Memory Management — Automatic cleanup, CUDA cache clearing, and real-time monitoring
Configurable Parameters — Fine-tune exaggeration, pace, temperature per request

Architecture

app/                        # FastAPI backend
├── config.py               # Environment-based configuration
├── main.py                 # FastAPI application entry
├── models/                 # Pydantic request/response models
│   ├── requests.py
│   └── responses.py
├── core/                   # Core engine
│   ├── tts_model.py        # Chatterbox TTS model management
│   ├── text_processing.py  # Chunking and preprocessing
│   └── memory.py           # Memory monitoring and cleanup
└── api/                    # Endpoint modules
    └── endpoints/
        ├── speech.py       # TTS generation + streaming
        ├── health.py       # Health checks
        ├── models.py       # Model listing (OpenAI compat)
        ├── memory.py       # Memory management API
        └── config.py       # Runtime configuration

frontend/                   # React web interface
docker/                     # Deployment configs (GPU/CPU/uv variants)
tests/                      # API and memory test suites

Quick Start

# Clone and enter
git clone https://github.com/Sid-V5/EchoSynth.git
cd EchoSynth/chatterbox-tts-api

# Option A: uv (recommended)
uv sync
cp .env.example .env
uv run main.py

# Option B: pip
python -m venv .venv && .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
python main.py

API docs auto-generated at http://localhost:4123/docs

API Usage

Basic Text-to-Speech

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from EchoSynth!"}' \
  --output speech.wav

Voice Cloning

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Hello with my voice!" \
  -F "voice_file=@my_voice.mp3" \
  --output cloned.wav

Voice Library

# Save a voice
curl -X POST http://localhost:4123/v1/voices \
  -F "voice_file=@my_voice.wav" \
  -F "name=custom-voice"

# Use it by name
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Reusable voice!", "voice": "custom-voice"}' \
  --output output.wav

Streaming

# Raw audio stream
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "Streams in real-time!"}' \
  --output stream.wav

# SSE format (OpenAI compatible)
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"input": "SSE streaming!", "stream_format": "sse"}' \
  --no-buffer

API Endpoints

Endpoint	Method	Description
`/v1/audio/speech`	POST	Generate speech (complete or SSE stream)
`/v1/audio/speech/upload`	POST	Generate speech with voice upload
`/v1/audio/speech/stream`	POST	Stream audio generation
`/v1/voices`	GET/POST	List or upload custom voices
`/v1/models`	GET	Available models (OpenAI compat)
`/health`	GET	Health check and status
`/status/progress`	GET	Real-time generation progress
`/memory`	GET	Memory usage and cleanup
`/docs`	GET	Interactive API documentation

Parameters

Parameter	Range	Default	Effect
`exaggeration`	0.25 – 2.0	0.5	Emotion intensity
`cfg_weight`	0.0 – 1.0	0.5	Pace control
`temperature`	0.05 – 5.0	0.8	Sampling randomness

Docker

cd chatterbox-tts-api

# Standard
docker compose -f docker/docker-compose.yml up -d

# GPU-accelerated
docker compose -f docker/docker-compose.gpu.yml up -d

# With React frontend
docker compose -f docker/docker-compose.yml --profile frontend up -d

Voice File Requirements

Format	Max Size	Recommended Duration
MP3, WAV, FLAC, M4A, OGG	10 MB	10–30s of clear speech

Technology Stack

API Framework: FastAPI + Uvicorn
TTS Engine: Chatterbox TTS (Resemble AI)
ML Runtime: PyTorch 2.x + torchaudio
STT: HuggingFace Transformers (Whisper-based)
Frontend: React + Vite
Validation: Pydantic v2
Containerization: Docker with GPU/CPU/uv variants

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
chatterbox-tts-api		chatterbox-tts-api
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mic.html		mic.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EchoSynth

Features

Architecture

Quick Start

API Usage

Basic Text-to-Speech

Voice Cloning

Voice Library

Streaming

API Endpoints

Parameters

Docker

Voice File Requirements

Technology Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EchoSynth

Features

Architecture

Quick Start

API Usage

Basic Text-to-Speech

Voice Cloning

Voice Library

Streaming

API Endpoints

Parameters

Docker

Voice File Requirements

Technology Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages