Swap ElevenLabs SDK for @speech-sdk/core and add a provider picker#35
Open
piersonmarks wants to merge 5 commits into
Conversation
- /api/speech now uses generateSpeech() from @speech-sdk/core, branching on a 'provider' field (elevenlabs default, openai added) - ChatVoice gains a provider dropdown alongside the voice dropdown - Provider + voice persist to localStorage; switching provider resets the voice to that provider's default if the prior voice isn't in the catalog - ElevenLabs remains the default so existing deployment behavior is unchanged
The previous `voices.length === 0` gate made sense when ElevenLabs was the only provider and an empty list meant "still loading." With the provider picker, an empty list is a legitimate state (e.g. ElevenLabs key lacks voices_read scope) and the user needs the UI to be interactive so they can switch to a provider with a static catalog. Replaced the gate with an explicit `voicesLoading` flag that flips off in the useEffect's finally block, so the page always renders after the first fetch attempt regardless of outcome.
|
@piersonmarks is attempting to deploy a commit to the BA Team on Vercel. A member of the Team first needs to authorize it. |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What I did
Replaces the direct
elevenlabsSDK integration in/api/speechwith@speech-sdk/core— a unified, multi-provider text-to-speech SDK — and adds a Speech Provider dropdown next to the existing voice picker so users can switch between TTS backends at runtime.Concretely:
@speech-sdk/coredependency (^0.4.1)app/utils/providers.ts— single source of truth for supported providers (currently ElevenLabs + OpenAI) with static voice catalogs where applicable/api/speechnow constructs aResolvedModelviacreateElevenLabs/createOpenAIand callsgenerateSpeech({ model, text, voice }); responds with the SDK-reportedmediaTypeinstead of hardcodingaudio/mpeggetVoices(provider)accepts a provider arg and returns a normalizedProviderVoiceshape — ElevenLabs voices stay dynamic (via the ElevenLabs voices API), OpenAI uses the static catalog (alloy / ash / ballad / coral / echo / fable / nova / onyx / sage / shimmer)ChatVoicerenders two<select>s (provider + voice). Provider + voice are persisted tolocalStorage; switching provider auto-snapsselectedVoiceto that provider's default when the prior selection isn't in the new catalogvoices.length === 0as a loading indicator, which caused the UI to hang forever if the ElevenLabs key lackedvoices_readscope. Replaced with an explicitvoicesLoadingflag that flips off after the first fetch attempt (success or failure), so users can always reach the provider dropdown and switch backendsElevenLabs remains the default so behavior at
chat-with-siri.vercel.appis unchanged for anyone who doesn't touch the new dropdown.Minor behavior change to call out: the previous route passed
voice_settings: { similarity_boost: 0.5, stability: 0.5 }to ElevenLabs directly.@speech-sdk/coreuses provider defaults; these can be reintroduced viaproviderOptionsif you'd like them back — happy to add that in a follow-up commit if you prefer.Closes:
How to test
Then at
http://localhost:3000/chat:main.alloy. Send a message — audio plays via OpenAI TTS (tts-1).localStorage.npx tsc --noEmitis clean andnpm run buildpasses.Any background context you want to add?
@speech-sdk/coreis an open-source, MIT-licensed multi-provider TTS SDK — repo at Jellypod-Inc/speech-sdk. Full disclosure: I'm one of the maintainers, which is part of why I was interested in this swap —chat-with-siri's architecture (single ElevenLabs integration with a voice picker) is exactly the shape the SDK abstracts cleanly, and it was a good real-world test. No hard feelings if you'd rather not take the dependency.SPEECH_PROVIDERSinapp/utils/providers.tsand add a case inapp/api/speech/route.ts. I kept it to ElevenLabs + OpenAI in this PR to minimize review surface./api/chat, the API-key modal, styling, or any unrelated code — scoped the PR tightly per your contributor guide.