|
| 1 | +--- |
| 2 | +name: deepgram-java-audio-intelligence |
| 3 | +description: Use when writing or reviewing Java code in this repo that enables Deepgram intelligence overlays on `/v1/listen` audio transcription - diarization, entity detection, sentiment, summarize, topics, intents, language detection, and redaction. Same endpoint as plain STT, but with extra request fields on `ListenV1RequestUrl` or `MediaTranscribeRequestOctetStream`. Use `deepgram-java-speech-to-text` for plain transcripts and `deepgram-java-text-intelligence` for analysis on existing text. Triggers include "audio intelligence", "diarize", "summarize audio", "sentiment from audio", "topic detection", and "redact". |
| 4 | +--- |
| 5 | + |
| 6 | +# Using Deepgram Audio Intelligence (Java SDK) |
| 7 | + |
| 8 | +Audio intelligence is not a separate client in this SDK. It is the **Listen V1 REST request surface** with additional analysis fields enabled. |
| 9 | + |
| 10 | +## When to use this product |
| 11 | + |
| 12 | +- You have **audio** and want transcript + analysis together. |
| 13 | +- REST is the main path; the Java WebSocket client only exposes the real-time subset. |
| 14 | + |
| 15 | +**Use a different skill when:** |
| 16 | +- You want plain transcription only → `deepgram-java-speech-to-text`. |
| 17 | +- You already have text and only need text analysis → `deepgram-java-text-intelligence`. |
| 18 | +- You need turn-aware conversational streaming → `deepgram-java-conversational-stt`. |
| 19 | + |
| 20 | +## Authentication |
| 21 | + |
| 22 | +```java |
| 23 | +import com.deepgram.DeepgramClient; |
| 24 | + |
| 25 | +DeepgramClient client = DeepgramClient.builder() |
| 26 | + .apiKey(System.getenv("DEEPGRAM_API_KEY")) |
| 27 | + .build(); |
| 28 | +``` |
| 29 | + |
| 30 | +## Quick start — REST with repo-backed example pattern |
| 31 | + |
| 32 | +```java |
| 33 | +import com.deepgram.resources.listen.v1.media.requests.ListenV1RequestUrl; |
| 34 | +import com.deepgram.resources.listen.v1.media.types.MediaTranscribeRequestModel; |
| 35 | +import com.deepgram.resources.listen.v1.media.types.MediaTranscribeResponse; |
| 36 | + |
| 37 | +ListenV1RequestUrl request = ListenV1RequestUrl.builder() |
| 38 | + .url("https://dpgr.am/spacewalk.wav") |
| 39 | + .model(MediaTranscribeRequestModel.NOVA3) |
| 40 | + .smartFormat(true) |
| 41 | + .punctuate(true) |
| 42 | + .diarize(true) |
| 43 | + .language("en-US") |
| 44 | + .build(); |
| 45 | + |
| 46 | +MediaTranscribeResponse result = client.listen().v1().media().transcribeUrl(request); |
| 47 | +``` |
| 48 | + |
| 49 | +The concrete repo example (`examples/listen/AdvancedOptions.java`) demonstrates the same pattern for enabling higher-value Listen options via the builder. |
| 50 | + |
| 51 | +## What else the REST request surface supports |
| 52 | + |
| 53 | +The generated `ListenV1RequestUrl` and `MediaTranscribeRequestOctetStream` classes also expose these verified analysis fields in this checkout: |
| 54 | + |
| 55 | +- `sentiment` |
| 56 | +- `summarize` |
| 57 | +- `topics` |
| 58 | +- `customTopic` |
| 59 | +- `customTopicMode` |
| 60 | +- `intents` |
| 61 | +- `customIntent` |
| 62 | +- `customIntentMode` |
| 63 | +- `detectEntities` |
| 64 | +- `detectLanguage` |
| 65 | +- `diarize` |
| 66 | +- `redact` |
| 67 | + |
| 68 | +## Quick start — WebSocket subset |
| 69 | + |
| 70 | +```java |
| 71 | +import com.deepgram.resources.listen.v1.websocket.V1ConnectOptions; |
| 72 | +import com.deepgram.resources.listen.v1.websocket.V1WebSocketClient; |
| 73 | +import com.deepgram.types.ListenV1Model; |
| 74 | +import java.util.concurrent.TimeUnit; |
| 75 | + |
| 76 | +V1WebSocketClient wsClient = client.listen().v1().v1WebSocket(); |
| 77 | +wsClient.onResults(result -> System.out.println(result)); |
| 78 | + |
| 79 | +wsClient.connect(V1ConnectOptions.builder() |
| 80 | + .model(ListenV1Model.NOVA3) |
| 81 | + .diarize(true) |
| 82 | + .build()) |
| 83 | + .get(10, TimeUnit.SECONDS); |
| 84 | +``` |
| 85 | + |
| 86 | +In this Java checkout, the WebSocket connect options include `diarize`, `detectEntities`, `redact`, and the normal streaming transcription controls, but **not** `summarize`, `topics`, `intents`, or `detectLanguage`. |
| 87 | + |
| 88 | +## Key parameters / API surface |
| 89 | + |
| 90 | +- REST builders: `ListenV1RequestUrl` and `MediaTranscribeRequestOctetStream` |
| 91 | +- REST analysis fields verified in source: `sentiment`, `summarize`, `topics`, `customTopic`, `customTopicMode`, `intents`, `customIntent`, `customIntentMode`, `detectEntities`, `detectLanguage`, `diarize`, `redact` |
| 92 | +- Helpful transcription companions: `smartFormat`, `punctuate`, `paragraphs`, `utterances`, `numerals`, `keywords`, `keyterm`, `replace`, `search` |
| 93 | +- WebSocket subset: `diarize`, `detectEntities`, `redact`, plus standard live transcription options |
| 94 | + |
| 95 | +## API reference (layered) |
| 96 | + |
| 97 | +1. **In-repo source of truth**: `src/main/java/com/deepgram/resources/listen/v1/media/requests/` and `src/main/java/com/deepgram/resources/listen/v1/websocket/` plus `examples/listen/AdvancedOptions.java`. `reference.md` is absent here. |
| 98 | +2. **Canonical OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml |
| 99 | +3. **Canonical AsyncAPI (WSS subset)**: https://developers.deepgram.com/asyncapi.yaml |
| 100 | +4. **Context7**: `/llmstxt/developers_deepgram_llms_txt` |
| 101 | +5. **Product docs**: |
| 102 | + - https://developers.deepgram.com/docs/stt-intelligence-feature-overview |
| 103 | + - https://developers.deepgram.com/docs/summarization |
| 104 | + - https://developers.deepgram.com/docs/topic-detection |
| 105 | + - https://developers.deepgram.com/docs/intent-recognition |
| 106 | + - https://developers.deepgram.com/docs/sentiment-analysis |
| 107 | + - https://developers.deepgram.com/docs/language-detection |
| 108 | + - https://developers.deepgram.com/docs/redaction |
| 109 | + - https://developers.deepgram.com/docs/diarization |
| 110 | + |
| 111 | +## Gotchas |
| 112 | + |
| 113 | +1. **There is no separate “audio intelligence client”.** Everything hangs off Listen V1. |
| 114 | +2. **Most intelligence fields are REST-only in this SDK surface.** The WebSocket connect options do not expose `summarize`, `topics`, `intents`, or `detectLanguage`. |
| 115 | +3. **`summarize` on Listen V1 is its own generated type.** Do not assume the Read API shape is identical. |
| 116 | +4. **The repo example only demonstrates diarization-level options.** There is no dedicated example file for sentiment/topics/intents in this checkout. |
| 117 | +5. **`redact` is currently a single `String` field on the REST builders.** Do not assume Python-style string-or-list support here. |
| 118 | +6. **Model support matters.** The examples consistently use `NOVA3`; follow that unless you have verified another model supports the overlays you need. |
| 119 | +7. **These fields live on both URL and byte-upload request builders.** Pick the builder that matches your input source. |
| 120 | + |
| 121 | +## Example files in this repo |
| 122 | + |
| 123 | +- `examples/listen/AdvancedOptions.java` |
| 124 | +- `examples/listen/TranscribeUrl.java` |
| 125 | +- `examples/listen/FileUploadTypes.java` |
| 126 | + |
| 127 | +## Central product skills |
| 128 | + |
| 129 | +For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: |
| 130 | + |
| 131 | +```bash |
| 132 | +npx skills add deepgram/skills |
| 133 | +``` |
| 134 | + |
| 135 | +This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). |
0 commit comments