Skip to content

Sid-V5/EchoSynth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EchoSynth

AI-powered voice synthesis platform delivering both Text-to-Speech (TTS) and Speech-to-Text (STT) through a high-performance REST API with voice cloning capabilities.

Python 3.11+ FastAPI PyTorch License

Features

  • OpenAI-Compatible API — Drop-in replacement for OpenAI's TTS API
  • Voice Cloning — Upload voice samples for personalized speech synthesis
  • Voice Library — Store, manage, and reuse custom voices by name
  • Real-time Streaming — Raw audio streaming and SSE (Server-Side Events) support
  • Smart Text Processing — Automatic chunking for long-form text
  • React Web UI — Optional frontend with dark/light mode for interactive use
  • Docker Ready — Full containerization with GPU, CPU, and uv-optimized variants
  • Memory Management — Automatic cleanup, CUDA cache clearing, and real-time monitoring
  • Configurable Parameters — Fine-tune exaggeration, pace, temperature per request

Architecture

app/                        # FastAPI backend
├── config.py               # Environment-based configuration
├── main.py                 # FastAPI application entry
├── models/                 # Pydantic request/response models
│   ├── requests.py
│   └── responses.py
├── core/                   # Core engine
│   ├── tts_model.py        # Chatterbox TTS model management
│   ├── text_processing.py  # Chunking and preprocessing
│   └── memory.py           # Memory monitoring and cleanup
└── api/                    # Endpoint modules
    └── endpoints/
        ├── speech.py       # TTS generation + streaming
        ├── health.py       # Health checks
        ├── models.py       # Model listing (OpenAI compat)
        ├── memory.py       # Memory management API
        └── config.py       # Runtime configuration

frontend/                   # React web interface
docker/                     # Deployment configs (GPU/CPU/uv variants)
tests/                      # API and memory test suites

Quick Start

# Clone and enter
git clone https://github.com/Sid-V5/EchoSynth.git
cd EchoSynth/chatterbox-tts-api

# Option A: uv (recommended)
uv sync
cp .env.example .env
uv run main.py

# Option B: pip
python -m venv .venv && .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
python main.py

API docs auto-generated at http://localhost:4123/docs

API Usage

Basic Text-to-Speech

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from EchoSynth!"}' \
  --output speech.wav

Voice Cloning

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Hello with my voice!" \
  -F "voice_file=@my_voice.mp3" \
  --output cloned.wav

Voice Library

# Save a voice
curl -X POST http://localhost:4123/v1/voices \
  -F "voice_file=@my_voice.wav" \
  -F "name=custom-voice"

# Use it by name
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Reusable voice!", "voice": "custom-voice"}' \
  --output output.wav

Streaming

# Raw audio stream
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "Streams in real-time!"}' \
  --output stream.wav

# SSE format (OpenAI compatible)
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"input": "SSE streaming!", "stream_format": "sse"}' \
  --no-buffer

API Endpoints

Endpoint Method Description
/v1/audio/speech POST Generate speech (complete or SSE stream)
/v1/audio/speech/upload POST Generate speech with voice upload
/v1/audio/speech/stream POST Stream audio generation
/v1/voices GET/POST List or upload custom voices
/v1/models GET Available models (OpenAI compat)
/health GET Health check and status
/status/progress GET Real-time generation progress
/memory GET Memory usage and cleanup
/docs GET Interactive API documentation

Parameters

Parameter Range Default Effect
exaggeration 0.25 – 2.0 0.5 Emotion intensity
cfg_weight 0.0 – 1.0 0.5 Pace control
temperature 0.05 – 5.0 0.8 Sampling randomness

Docker

cd chatterbox-tts-api

# Standard
docker compose -f docker/docker-compose.yml up -d

# GPU-accelerated
docker compose -f docker/docker-compose.gpu.yml up -d

# With React frontend
docker compose -f docker/docker-compose.yml --profile frontend up -d

Voice File Requirements

Format Max Size Recommended Duration
MP3, WAV, FLAC, M4A, OGG 10 MB 10–30s of clear speech

Technology Stack

  • API Framework: FastAPI + Uvicorn
  • TTS Engine: Chatterbox TTS (Resemble AI)
  • ML Runtime: PyTorch 2.x + torchaudio
  • STT: HuggingFace Transformers (Whisper-based)
  • Frontend: React + Vite
  • Validation: Pydantic v2
  • Containerization: Docker with GPU/CPU/uv variants

License

MIT

About

Voice synthesis platform with TTS and STT. FastAPI backend, voice cloning, OpenAI-compatible API, React frontend, Docker support

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors