Cartesia Sonic TTS - Home Assistant Custom Integration

A Home Assistant custom integration that connects to the Cartesia Sonic text-to-speech API, giving HA access to Cartesia's library of 600+ voices across 42 languages with fine-grained control over speed, volume, and emotion.

Disclaimer: This is an unofficial, community-developed integration. It is not affiliated with, endorsed by, or supported by Cartesia AI. For Cartesia API support, visit cartesia.ai or their Discord. For integration issues, please open a GitHub issue on this repository.

Features

Full tts.speak support in Home Assistant.
Three Cartesia models: sonic-3.5 (recommended), sonic-3, and sonic-turbo.
600+ voices filterable by language in the config UI.
59 emotion presets for expressive speech.
Speed (0.6 to 1.5) and volume (0.5 to 2.0) controls.
All generation parameters overridable per tts.speak call.
SSML passthrough: embed Cartesia tags directly in message text.
Config flow setup and reconfiguration entirely through the HA UI.
API key can be changed at any time via Reconfigure or the automatic reauth prompt.

Requirements

Home Assistant 2025.1 or later.
A Cartesia API key. Sign up and create a free key at play.cartesia.ai/keys.

Installation

Note

Only one instance of the integration is supported. If you attempt to add it again, HA will show a message directing you to the Configure button on the existing entry.

HACS (Recommended)

Open HACS in your Home Assistant sidebar.
Click the three-dot menu (top right) and choose Custom repositories.
Paste https://github.com/sfox38/cartesia_tts and select Integration as the category.
Click Add, then find Cartesia Sonic TTS in the HACS Integration list and click Download.
Restart Home Assistant.

Manual Installation

Download the latest release zip from this repository and unpack it.
Copy the cartesia_tts folder into your config/custom_components/ directory. The result should be config/custom_components/cartesia_tts/.
Restart Home Assistant.

Setup Wizard

The initial setup wizard has four steps.

Step 1: API Key

Enter your Cartesia API key. The integration validates it against the Cartesia API before continuing.

Step 2: Model

Choose your default Cartesia model.

Model	Latency	Languages	Emotion support
Sonic 3.5 (recommended)	~100ms	42	Full (59 emotions)
Sonic 3	90ms	42	Full (59 emotions)
Sonic Turbo	40ms	15	Limited

Step 3: Language and generation settings

Choose your default language, speed, volume, and emotion. The language list is filtered to only show languages supported by the model chosen in step 2.

Use the option at the bottom to go back to model selection if needed.

Step 4: Voice

Choose your default voice. The dropdown shows only voices for the selected language. Voices are sorted alphabetically. Voice names include accent information where relevant (e.g. "Matilda - Australian Female").

Use the option at the bottom to go back to settings.

Reconfiguring After Setup

Go to Settings -> Devices and Services -> Cartesia Sonic TTS -> Configure.

The Configure dialog follows the same three-step flow (model, settings, voice) with your current values pre-filled. The voice list is always refreshed from the Cartesia API when you reach the voice step, so any voices Cartesia has added since your last session appear immediately.

Changing Your API Key

Automatically: If your API key is revoked or expires, HA detects this the next time speech is synthesised and displays a repair notification prompting you to re-enter your key. Click the notification and enter the new key.

Manually: Go to Settings -> Devices and Services -> Cartesia Sonic TTS, click the three-dot menu, and choose Reconfigure. Enter a new API key. All other settings (model, voice, language, etc.) are preserved.

Using `tts.speak`

Basic example

action: tts.speak
target:
  entity_id: tts.cartesia_sonic_tts
data:
  media_player_entity_id: media_player.living_room
  message: "Hello from Cartesia."

With per-call overrides

All generation parameters can be overridden for a single call via the options dict. Overrides take precedence over the defaults set in the Configure dialog.

action: tts.speak
target:
  entity_id: tts.cartesia_sonic_tts
data:
  media_player_entity_id: media_player.living_room
  message: "This is urgent!"
  options:
    emotion: alarmed
    speed: 1.3
    volume: 1.5

Override voice, language, or model for a single call

action: tts.speak
target:
  entity_id: tts.cartesia_sonic_tts
data:
  media_player_entity_id: media_player.kitchen
  message: "Bonjour le monde."
  language: fr
  options:
    model: sonic-3.5
    voice_id: "ab636c8b-9960-4fb3-bb0c-b7b655fb9745"

With SSML tags

Cartesia SSML tags can be embedded directly in the message text. They are passed to the API as-is.

message: "<emotion value='angry'/> How dare you speak to me like that!"
message: "<speed ratio='1.5'/> I like to talk fast."
message: "<volume ratio='1.5'/> This part is louder."

See the Cartesia SSML documentation for the full tag reference. Note that speed, volume, and emotion SSML tags are currently in beta.

Options Reference

The following keys are accepted in the options dict of tts.speak:

Key	Type	Description
`model`	string	`sonic-3.5`, `sonic-3`, or `sonic-turbo`
`voice_id`	string	Cartesia voice UUID
`language`	string	ISO 639-1 language code (e.g. `en`, `fr`, `ja`)
`speed`	float	Speed multiplier. 0.6 slowest, 1.0 normal, 1.5 fastest
`volume`	float	Volume multiplier. 0.5 quietest, 1.0 normal, 2.0 loudest
`emotion`	string	Emotion name (see list below)

Supported Emotions

Emotions are guidance to the model, not strict transformations. Results vary by voice and transcript. For best results use one of Cartesia's recommended emotive voices (tagged "Emotive" in the Cartesia voice library).

The primary emotions with the most training data are: angry, content, excited, neutral, sad, scared.

Full list (pass to the API or options dict):

affectionate, agitated, alarmed, amazed, angry, anticipation, anxious, apologetic, bored, calm, confident, confused, contemplative, contempt, content, curious, dejected, determined, disappointed, disgusted, distant, elated, enthusiastic, envious, euphoric, excited, flirtatious, frustrated, grateful, guilty, happy, hesitant, hurt, insecure, ironic, joking/comedic, mad, melancholic, mysterious, neutral, nostalgic, outraged, panicked, peaceful, proud, rejected, resigned, sad, sarcastic, scared, serene, skeptical, surprised, sympathetic, threatened, tired, triumphant, trust, wistful

Supported Languages

Sonic 3.5 and Sonic 3 (42 languages)

Arabic, Bengali, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Malay, Malayalam, Marathi, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Slovak, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese

Sonic Turbo (15 languages)

Chinese, Dutch, English, French, German, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Turkish

Voice Dialects and Accents

The Cartesia API does not expose dialect codes (e.g. en-AU) in the synthesis request. Accent is a property of the voice itself. Many voices in the Cartesia library include accent information in their name (e.g. "Matilda - Australian Female"). Voice selection is effectively dialect selection.

Troubleshooting

"No voice configured": Open Configure and complete the voice selection step.

Voice browser shows no voices in HA: The voice list is fetched once when HA starts. If this fails (e.g. a transient network error at boot), open Configure and proceed to the voice step - this always triggers a fresh fetch.

Emotion has no effect: Not all voices respond well to emotion guidance. Try one of the recommended emotive voices from the Cartesia voice library (filter by "Emotive" tag). Emotion is not reliably supported on Sonic Turbo.

Wrong accent: The language code alone does not control accent. Select a voice whose name or description matches your desired accent.

SSML not working: The message string must contain valid Cartesia SSML. The speed, volume, and emotion SSML tags are currently in beta. Invalid or malformed tags are silently ignored by the Cartesia API.

No audio output or other unexpected behaviour: Check Settings -> System -> Logs in the HA UI, or open /config/home-assistant.log. Error and warning messages from this integration are always logged at standard level with no configuration needed. If you need more detail (such as the exact request being sent to Cartesia), add the following to your configuration.yaml and restart HA:

logger:
  logs:
    custom_components.cartesia_tts: debug

Note

Debug logging includes the first 50 characters of the message being synthesised. Avoid enabling it long-term if your announcements contain sensitive information.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
custom_components/cartesia_tts		custom_components/cartesia_tts
LICENSE		LICENSE
README.md		README.md
hacs.json		hacs.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cartesia Sonic TTS - Home Assistant Custom Integration

Features

Requirements

Installation

HACS (Recommended)

Manual Installation

Setup Wizard

Step 1: API Key

Step 2: Model

Step 3: Language and generation settings

Step 4: Voice

Reconfiguring After Setup

Changing Your API Key

Using `tts.speak`

Basic example

With per-call overrides

Override voice, language, or model for a single call

With SSML tags

Options Reference

Supported Emotions

Supported Languages

Sonic 3.5 and Sonic 3 (42 languages)

Sonic Turbo (15 languages)

Voice Dialects and Accents

Troubleshooting

About

Uh oh!

Releases 4

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cartesia Sonic TTS - Home Assistant Custom Integration

Features

Requirements

Installation

HACS (Recommended)

Manual Installation

Setup Wizard

Step 1: API Key

Step 2: Model

Step 3: Language and generation settings

Step 4: Voice

Reconfiguring After Setup

Changing Your API Key

Using tts.speak

Basic example

With per-call overrides

Override voice, language, or model for a single call

With SSML tags

Options Reference

Supported Emotions

Supported Languages

Sonic 3.5 and Sonic 3 (42 languages)

Sonic Turbo (15 languages)

Voice Dialects and Accents

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Contributors

Uh oh!

Languages

Using `tts.speak`