Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 36 additions & 46 deletions .agents/skills/deepgram-js-audio-intelligence/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,19 @@ description: Use when writing or reviewing JavaScript/TypeScript in this repo th

Analytics overlays applied to `/v1/listen`: summaries, topics, intents, sentiment, language detection, diarization, redaction, entities. Same client surface as STT; turn features on with parameters.

## When to use this product
**Use a different skill when:** plain transcription → `deepgram-js-speech-to-text`; analytics on text → `deepgram-js-text-intelligence`; Flux turn-taking → `deepgram-js-conversational-stt`; full-duplex agent → `deepgram-js-voice-agent`.

- You have **audio** and want analytics returned alongside the transcript.
- REST is the primary path; the WebSocket path supports only a subset of intelligence features.
## Authentication

```js
require("dotenv").config();

**Use a different skill when:**
- You just want transcript output → `deepgram-js-speech-to-text`.
- You already have text and want analytics on that text → `deepgram-js-text-intelligence`.
- You need Flux turn-taking → `deepgram-js-conversational-stt`.
- You need a full interactive voice agent → `deepgram-js-voice-agent`.
const { DeepgramClient } = require("@deepgram/sdk");

const deepgramClient = new DeepgramClient({
apiKey: process.env.DEEPGRAM_API_KEY,
});
```

## Feature availability: REST vs WSS

Expand All @@ -32,18 +35,6 @@ Analytics overlays applied to `/v1/listen`: summaries, topics, intents, sentimen
| `sentiment` | yes | no |
| `detect_language` | yes | no |

## Authentication

```js
require("dotenv").config();

const { DeepgramClient } = require("@deepgram/sdk");

const deepgramClient = new DeepgramClient({
apiKey: process.env.DEEPGRAM_API_KEY,
});
```

## Quick start — REST with analytics

From `examples/22-transcription-advanced-options.ts`:
Expand All @@ -70,6 +61,14 @@ const data = await deepgramClient.listen.v1.media.transcribeUrl({
keyterm: ["keyword1", "keyword2"],
redact: ["pci", "ssn"],
});

// Verify intelligence results are present
const summary = data.results?.summary?.short;
const topics = data.results?.topics?.segments;
const sentiments = data.results?.sentiments?.segments;
if (!summary && !topics && !sentiments) {
console.warn("No intelligence results — check feature/model/language support.");
}
```

## Quick start — WSS subset
Expand All @@ -85,6 +84,13 @@ const deepgramConnection = await deepgramClient.listen.v1.createConnection({
});
```

## Workflow

1. **Select features** from the REST vs WSS table. WSS lacks `summarize`, `topics`, `intents`, `sentiment`, `detect_language`.
2. **Call** `transcribeUrl` / `transcribeFile` with chosen flags and `model: "nova-3"`.
3. **Validate response**: check `data.results?.summary`, `data.results?.topics?.segments`, `data.results?.sentiments?.segments`. Fields are absent (not errored) when the model/language combo does not support the feature.
4. **On missing results**: confirm the feature/model/language combination at https://developers.deepgram.com/docs/stt-intelligence-feature-overview, then retry with corrected params.

## Key parameters / API surface

- Analytics flags: `summarize`, `topics`, `intents`, `sentiment`, `detect_language`, `detect_entities`, `diarize`, `redact`, `custom_topic`, `custom_topic_mode`, `custom_intent`, `custom_intent_mode`.
Expand All @@ -93,28 +99,18 @@ const deepgramConnection = await deepgramClient.listen.v1.createConnection({

## API reference (layered)

1. **In-repo reference**: `reference.md` → `Listen V1 Media`; WSS subset behavior lives in `src/CustomClient.ts` and `src/api/resources/listen/resources/v1/client/{Client,Socket}.ts`.
2. **Canonical OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml
3. **Canonical AsyncAPI (WSS)**: https://developers.deepgram.com/asyncapi.yaml
4. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`
5. **Product docs**:
- https://developers.deepgram.com/docs/stt-intelligence-feature-overview
- https://developers.deepgram.com/docs/summarization
- https://developers.deepgram.com/docs/topic-detection
- https://developers.deepgram.com/docs/intent-recognition
- https://developers.deepgram.com/docs/sentiment-analysis
- https://developers.deepgram.com/docs/language-detection
- https://developers.deepgram.com/docs/redaction
- https://developers.deepgram.com/docs/diarization
1. **In-repo**: `reference.md` → `Listen V1 Media`; WSS subset in `src/api/resources/listen/resources/v1/client/{Client,Socket}.ts`.
2. **OpenAPI / AsyncAPI**: https://developers.deepgram.com/openapi.yaml | https://developers.deepgram.com/asyncapi.yaml
3. **Context7**: library ID `/llmstxt/developers_deepgram_llms_txt`
4. **Product docs**: https://developers.deepgram.com/docs/stt-intelligence-feature-overview (links to summarization, topic detection, intent recognition, sentiment, language detection, redaction, diarization).

## Gotchas

1. **`summarize` on `/v1/listen` is versioned, not plain boolean.** The generated REST surface and examples point at `"v2"`.
2. **Most intelligence flags are REST-only.** Current WSS connect args do not expose `topics`, `intents`, `sentiment`, `summarize`, or `detect_language`.
3. **`redact` typing is looser in practice than in the generated alias.** Examples pass arrays like `["pci", "ssn"]`, even though `ListenV1Redact` itself is just a string alias.
4. **Use `keyterm` for Nova-3 biasing.** `examples/22-transcription-advanced-options.ts` explicitly notes keywords are not supported for Nova-3.
5. **Model/feature support is product-side.** `nova-3` is the safest choice when mixing many overlays.
6. **Diarization quality depends on audio quality and duration.** Short or noisy clips churn speakers.
1. **`summarize` is `"v2"`, not boolean.** The generated REST surface and examples use the string value.
2. **`redact` accepts arrays** like `["pci", "ssn"]` despite `ListenV1Redact` being a string alias.
3. **Use `keyterm`, not `keywords`, for Nova-3 biasing.**
4. **Prefer `nova-3`** when mixing many overlays -- broadest feature support.
5. **Diarization quality depends on audio quality and duration.** Short or noisy clips churn speakers.

## Example files in this repo

Expand All @@ -125,10 +121,4 @@ const deepgramConnection = await deepgramClient.listen.v1.createConnection({

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
40 changes: 10 additions & 30 deletions .agents/skills/deepgram-js-management-api/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,7 @@ description: Use when writing or reviewing JavaScript/TypeScript in this repo th

Administrative REST endpoints under `/v1/projects`, `/v1/models`, and related project subresources.

## When to use this product

- **Projects**: list, get, update, delete, leave.
- **Keys**: list, get, create, delete API keys.
- **Members + invites**: inspect members, update scopes, create/delete invites.
- **Usage + billing**: inspect requests, usage, usage breakdown, balances, purchases, billing breakdown.
- **Models**: list global models and project-scoped models.
- **Agent think models**: discover available model providers for Voice Agent `think` settings.

**Use a different skill when:**
- You want to run a live websocket agent session → `deepgram-js-voice-agent`.
- You want transcription or synthesis calls rather than project/admin APIs → product-specific skills.
**Use a different skill when:** live agent session → `deepgram-js-voice-agent`; transcription or synthesis → product-specific skills.

## Authentication

Expand Down Expand Up @@ -80,6 +69,13 @@ Think-model discovery for Voice Agent:
await deepgramClient.agent.v1.settings.think.models.list();
```

## Workflow for destructive operations

1. **List** the resources first (e.g., `projects.keys.list(projectId)`).
2. **Confirm** the target ID with the user before proceeding.
3. **Execute** the delete/leave/remove call.
4. **Verify** by listing again to confirm deletion.

## Key parameters / API surface

- Projects: `client.manage.v1.projects.list/get/update/delete/leave`.
Expand Down Expand Up @@ -119,24 +115,8 @@ The current JS SDK does **not** expose persisted Voice Agent configuration CRUD

## Example files in this repo

- `examples/13-management-projects.ts`
- `examples/14-management-keys.ts`
- `examples/15-management-members.ts`
- `examples/16-management-invites.ts`
- `examples/17-management-usage.ts`
- `examples/18-management-billing.ts`
- `examples/19-management-models.ts`
- `examples/29-management-usage-breakdown.ts`
- `examples/30-management-billing-detailed.ts`
- `examples/31-management-member-permissions.ts`
- `examples/32-management-project-models.ts`
`examples/13-management-projects.ts` through `examples/19-management-models.ts`, plus `examples/29-32-*` for usage breakdown, billing details, member permissions, and project models.

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
18 changes: 3 additions & 15 deletions .agents/skills/deepgram-js-text-intelligence/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,13 @@
---
name: deepgram-js-text-intelligence
description: Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with `body: { text }` or `body: { url }`. Use `deepgram-js-audio-intelligence` when the source is audio instead of text. Triggers include "read API", "text intelligence", "analyze text", "sentiment", "summarize text", "topics", "intents", and "read.v1".
description: "Use when writing or reviewing JavaScript/TypeScript in this repo that calls Deepgram Text Intelligence / Read (`/v1/read`) for sentiment, summarization, topic detection, and intent recognition on text input. Covers `client.read.v1.text.analyze(...)` with `body: { text }` or `body: { url }`. Use `deepgram-js-audio-intelligence` when the source is audio instead of text. Triggers: read API, text intelligence, analyze text, sentiment, summarize text, topics, intents, read.v1."
---

# Using Deepgram Text Intelligence (JavaScript / TypeScript SDK)

Analyze text or a hosted text URL for sentiment, summarization, topics, and intents via `/v1/read`.

## When to use this product

- You already have **text** (transcript, document, email, chat log) and want analytics.
- You want a single REST call; there is no streaming Read API in this SDK.

**Use a different skill when:**
- Your source is audio and you want the analytics applied during transcription → `deepgram-js-audio-intelligence`.
**Use a different skill when:** source is audio → `deepgram-js-audio-intelligence`. This API is REST-only; there is no streaming Read API in this SDK.

## Authentication

Expand Down Expand Up @@ -85,10 +79,4 @@ For broader coverage, `examples/28-text-intelligence-advanced.ts` also demonstra

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
18 changes: 5 additions & 13 deletions .agents/skills/deepgram-js-text-to-speech/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,9 @@ description: Use when writing or reviewing JavaScript/TypeScript in this repo th

Convert text to audio with one-shot REST generation or low-latency streaming synthesis via `/v1/speak`.

## When to use this product
Two modes: **REST** (`client.speak.v1.audio.generate`) for one-shot synthesis, **WebSocket** (`client.speak.v1.createConnection()`) for low-latency streaming.

- **REST (`client.speak.v1.audio.generate`)** — render finished text into an audio response. Best for downloadable files, pre-generated prompts, batch synthesis.
- **WebSocket (`client.speak.v1.createConnection()` / `connect()`)** — stream text in and receive audio out with lower latency. Best when an LLM is still producing tokens.

**Use a different skill when:**
- You need the agent to also listen, think, and handle barge-in → `deepgram-js-voice-agent`.
**Use a different skill when:** full-duplex agent with STT + LLM + TTS → `deepgram-js-voice-agent`.

## Authentication

Expand Down Expand Up @@ -71,6 +67,8 @@ deepgramConnection.sendText({ type: "Speak", text: "Hello from streaming TTS." }
deepgramConnection.sendFlush({ type: "Flush" });
```

**Error handling:** Listen for `Warning` events in the message handler. If the connection drops, create a new connection and re-register handlers; the SDK does not auto-reconnect.

## Key parameters / API surface

- REST & WSS: `model`, `encoding`, `sample_rate`, `container`, `bit_rate`, `callback`, `callback_method`, `tag`, `mip_opt_out`.
Expand Down Expand Up @@ -111,10 +109,4 @@ Unlike the Python SDK, this repo does **not** include a hand-written `TextBuilde

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.
30 changes: 13 additions & 17 deletions .agents/skills/deepgram-js-voice-agent/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,7 @@ description: Use when writing or reviewing JavaScript/TypeScript in this repo th

Full-duplex voice agent runtime over `wss://agent.deepgram.com/v1/agent/converse`: audio in, LLM orchestration, audio out, plus function calling and prompt/runtime updates.

## When to use this product

- You want an **interactive voice assistant** where the user speaks, the agent thinks, and the agent responds with speech.
- You need **function / tool calling** inside the conversation loop.
- You want Deepgram to host the STT + think + TTS orchestration.

**Use a different skill when:**
- You only need transcription → `deepgram-js-speech-to-text` or `deepgram-js-conversational-stt`.
- You only need synthesis → `deepgram-js-text-to-speech`.
- You want project keys, usage, models, or other admin APIs → `deepgram-js-management-api`.
**Use a different skill when:** transcription only → `deepgram-js-speech-to-text` or `deepgram-js-conversational-stt`; synthesis only → `deepgram-js-text-to-speech`; admin APIs → `deepgram-js-management-api`.

## Authentication

Expand Down Expand Up @@ -72,6 +63,17 @@ deepgramConnection.sendSettings({

The same example also shows `client.agent.v1.settings.think.models.list()` for discovering supported think models.

## Workflow

1. `createConnection()` — returns a lazy socket; no network call yet.
2. Register `on("message", ...)` handlers for `SettingsApplied`, `ConversationText`, `FunctionCallRequest`, `Error`, and audio payloads.
3. `connect()` then `await waitForOpen()`.
4. `sendSettings({ type: "Settings", ... })` — **must be the first message**. Wait for `SettingsApplied` before proceeding.
5. `sendMedia(chunk)` to stream user audio. Send `sendKeepAlive(...)` every ~5 s during silence.
6. Handle `FunctionCallRequest` with `sendFunctionCallResponse({ type: "FunctionCallResponse", id, name, content })`.
7. Use `sendUpdatePrompt(...)`, `sendUpdateThink(...)`, `sendUpdateSpeak(...)` for runtime changes.
8. On `Error` event, log the error and close/reconnect as appropriate.

## Key parameters / API surface

- Connection setup: `client.agent.v1.createConnection()` / `connect()`.
Expand Down Expand Up @@ -114,10 +116,4 @@ This SDK exposes the **live agent runtime** plus `settings.think.models.list()`,

## Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

```bash
npx skills add deepgram/skills
```

This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`).
For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.