Skip to content

Swap ElevenLabs SDK for @speech-sdk/core and add a provider picker#35

Open
piersonmarks wants to merge 5 commits into
BolajiAyodeji:mainfrom
Jellypod-Inc:pierson/spe-41-pr-bolajiayodejichat-with-siri-swap-elevenlabs-for-speechsdk
Open

Swap ElevenLabs SDK for @speech-sdk/core and add a provider picker#35
piersonmarks wants to merge 5 commits into
BolajiAyodeji:mainfrom
Jellypod-Inc:pierson/spe-41-pr-bolajiayodejichat-with-siri-swap-elevenlabs-for-speechsdk

Conversation

@piersonmarks

Copy link
Copy Markdown

What I did

Replaces the direct elevenlabs SDK integration in /api/speech with @speech-sdk/core — a unified, multi-provider text-to-speech SDK — and adds a Speech Provider dropdown next to the existing voice picker so users can switch between TTS backends at runtime.

Concretely:

  • Added @speech-sdk/core dependency (^0.4.1)
  • New provider registry at app/utils/providers.ts — single source of truth for supported providers (currently ElevenLabs + OpenAI) with static voice catalogs where applicable
  • /api/speech now constructs a ResolvedModel via createElevenLabs / createOpenAI and calls generateSpeech({ model, text, voice }); responds with the SDK-reported mediaType instead of hardcoding audio/mpeg
  • getVoices(provider) accepts a provider arg and returns a normalized ProviderVoice shape — ElevenLabs voices stay dynamic (via the ElevenLabs voices API), OpenAI uses the static catalog (alloy / ash / ballad / coral / echo / fable / nova / onyx / sage / shimmer)
  • ChatVoice renders two <select>s (provider + voice). Provider + voice are persisted to localStorage; switching provider auto-snaps selectedVoice to that provider's default when the prior selection isn't in the new catalog
  • Fixed a latent bug surfaced by the provider change: the chat page was gated on voices.length === 0 as a loading indicator, which caused the UI to hang forever if the ElevenLabs key lacked voices_read scope. Replaced with an explicit voicesLoading flag that flips off after the first fetch attempt (success or failure), so users can always reach the provider dropdown and switch backends
  • Updated README with a "Speech providers" section documenting the dropdown, the two built-in providers, and how to add more

ElevenLabs remains the default so behavior at chat-with-siri.vercel.app is unchanged for anyone who doesn't touch the new dropdown.

Minor behavior change to call out: the previous route passed voice_settings: { similarity_boost: 0.5, stability: 0.5 } to ElevenLabs directly. @speech-sdk/core uses provider defaults; these can be reintroduced via providerOptions if you'd like them back — happy to add that in a follow-up commit if you prefer.

Closes:

How to test

git checkout <this-branch>
npm install
cp .env.example .env.local
# add OPENAI_API_KEY and ELEVENLABS_API_KEY
npm run dev

Then at http://localhost:3000/chat:

  1. ElevenLabs regression check — provider dropdown defaults to "ElevenLabs", voice list populates dynamically, sending a message plays audio. Behavior should be identical to main.
  2. OpenAI path — switch the provider dropdown to "OpenAI". The voice dropdown swaps to the 10 static OpenAI voices, selection snaps to alloy. Send a message — audio plays via OpenAI TTS (tts-1).
  3. Persistence — reload the page; provider + voice selection survives via localStorage.
  4. Error surface — temporarily break one of the keys; the toast shows the correct provider name ("Your OpenAI API Key is invalid…" vs ElevenLabs).

npx tsc --noEmit is clean and npm run build passes.

Any background context you want to add?

  • @speech-sdk/core is an open-source, MIT-licensed multi-provider TTS SDK — repo at Jellypod-Inc/speech-sdk. Full disclosure: I'm one of the maintainers, which is part of why I was interested in this swap — chat-with-siri's architecture (single ElevenLabs integration with a voice picker) is exactly the shape the SDK abstracts cleanly, and it was a good real-world test. No hard feelings if you'd rather not take the dependency.
  • Adding more providers (Deepgram, Cartesia, Hume, Google, Fish Audio, xAI, etc.) is now a two-line change: append to SPEECH_PROVIDERS in app/utils/providers.ts and add a case in app/api/speech/route.ts. I kept it to ElevenLabs + OpenAI in this PR to minimize review surface.
  • I deliberately did not touch /api/chat, the API-key modal, styling, or any unrelated code — scoped the PR tightly per your contributor guide.
  • No tests added because the repo has no test harness and introducing one felt out of scope. Happy to add a vitest setup in a separate PR if you'd find that useful.

- /api/speech now uses generateSpeech() from @speech-sdk/core, branching on
  a 'provider' field (elevenlabs default, openai added)
- ChatVoice gains a provider dropdown alongside the voice dropdown
- Provider + voice persist to localStorage; switching provider resets the
  voice to that provider's default if the prior voice isn't in the catalog
- ElevenLabs remains the default so existing deployment behavior is unchanged
The previous `voices.length === 0` gate made sense when ElevenLabs was the
only provider and an empty list meant "still loading." With the provider
picker, an empty list is a legitimate state (e.g. ElevenLabs key lacks
voices_read scope) and the user needs the UI to be interactive so they
can switch to a provider with a static catalog.

Replaced the gate with an explicit `voicesLoading` flag that flips off in
the useEffect's finally block, so the page always renders after the first
fetch attempt regardless of outcome.
@vercel

vercel Bot commented Apr 8, 2026

Copy link
Copy Markdown

@piersonmarks is attempting to deploy a commit to the BA Team on Vercel.

A member of the Team first needs to authorize it.

@socket-security

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​speech-sdk/​core@​0.4.17510010093100

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant