WhisperSubs is a Jellyfin plugin that generates subtitles for media libraries using local AI speech-to-text models. All processing happens on the server — no cloud APIs. The primary backend is whisper.cpp with Vulkan/CUDA GPU acceleration support.
- Repo: GeiserX/whisper-subs
- Plugin GUID:
97124bd9-c8cd-4a53-a213-e593aa3fef52 - Target: Jellyfin 10.11+ / .NET 9.0
- License: GPL-3.0
Plugin.cs Entry point, IHasWebPages (embeds config UI)
├── Configuration/
│ └── PluginConfiguration.cs User-editable settings (model path, binary path, language, etc.)
├── Api/
│ └── SubtitleController.cs REST API endpoints under /Plugins/WhisperSubs/*
├── Controller/
│ ├── SubtitleManager.cs Orchestrator: language detection → audio extraction → transcription → save
│ └── SubtitleQueueService.cs Thread-safe in-memory queue with single-worker drain loop
├── Providers/
│ ├── ISubtitleProvider.cs Provider interface (TranscribeAsync)
│ └── WhisperProvider.cs whisper.cpp integration (finds binary, runs process, reads SRT output)
├── ScheduledTasks/
│ └── SubtitleGenerationTask.cs Jellyfin scheduled task for auto-generation
└── Web/
└── configPage.html Admin UI (embedded resource) — vanilla JS, Jellyfin emby-* components
- Language detection —
SubtitleManager.DetectAudioLanguagesAsynccalls FFprobe to read audio stream language tags. ISO 639-2/B codes are normalized to 639-1 (e.g.,spa→es). - Audio extraction — FFmpeg extracts 16kHz mono PCM WAV from the media file to a temp path.
- Transcription —
WhisperProvider.TranscribeAsyncinvokeswhisper-clias a child process with the model and audio file. Output is an SRT file. - Save — The SRT content is written alongside the media as
<filename>.<lang>.generated.srt. - Metadata refresh —
item.RefreshMetadata()tells Jellyfin to pick up the new subtitle file.
Forced subtitles capture only foreign-language dialogue segments (e.g., Russian dialogue in an English film). The pipeline:
- VAD (Voice Activity Detection) — FFmpeg
silencedetectsplits the full audio into speech chunks using-30dB:d=0.5thresholds. - Per-chunk language detection — Each chunk is fed to
WhisperProvider.DetectLanguageAsync(whisper--detect-languagemode). Returns a language code + probability. - Foreign segment identification — Chunks where
detectedLang != primaryLang && probability >= 0.3are marked as foreign. Adjacent foreign chunks are merged. - Selective transcription — Only the foreign segments are extracted and transcribed individually, with timestamps offset to match the original media timeline.
- Save — Written as
<filename>.<lang>.forced.generated.srt. - No-foreign marker — If zero foreign chunks are detected (and at least one detection succeeded), a
<filename>.<lang>.forced.noforeignlangempty marker file is written to skip the item on future runs.
SubtitleMode enum controls behavior:
Full(0, default) — Only full transcriptionForcedOnly(1) — Only forced subtitle detectionFullAndForced(2) — Both
WhisperThreadCount controls the -t N flag passed to whisper-cli. Default 0 = whisper's internal default (4 threads). Set to your CPU core count for faster transcription. On a 20-thread CPU, this can yield ~12-13x parallelism.
WhisperProvider instances are constructed fresh for each work item (both in SubtitleController and SubtitleGenerationTask), so config changes via the plugin settings page take effect on the next work item without a Jellyfin restart.
Manual subtitle requests go through SubtitleQueueService:
Enqueue()— Fire-and-forget. ThePOST /Items/{id}/Generateendpoint returns HTTP 202 immediately.EnsureDraining()— Starts a single background worker if one isn't already running. UsesInterlocked.CompareExchangefor thread safety.- Race condition protection — After the drain loop exits, it re-checks the queue and restarts if new items arrived during the
finallyblock. - Skip existing — The drain loop checks for
.generated.srtfiles before processing, so re-queuing after a restart is safe (already-done items are skipped instantly). - Persisted to disk — The queue is saved to
queue.jsonin the plugin data folder (/config/data/WhisperSubs/queue.json) on every enqueue/dequeue. On startup,RestoreQueue()reloads pending items before the library scan begins.
SubtitleGenerationTask runs on startup and daily at 2 AM (configurable in Jellyfin UI). It:
- Queries all enabled libraries for
MovieandEpisodeitems without subtitles. - Checks for existing
.generated.srtfiles (restart resilience). - Between each auto-generation item, drains any priority queue items (manual requests take precedence).
All require Jellyfin admin auth (Authorization: MediaBrowser Token="<token>").
| Method | Path | Returns | Notes |
|---|---|---|---|
GET |
/Plugins/WhisperSubs/Libraries |
LibraryInfo[] |
All virtual folders |
GET |
/Plugins/WhisperSubs/Libraries/{id}/Items?startIndex=0&limit=50 |
PagedItemResult |
Movies/Episodes with subtitle status |
POST |
/Plugins/WhisperSubs/Items/{id}/Generate?language=auto |
202 Accepted | Enqueues, returns immediately |
GET |
/Plugins/WhisperSubs/Items/{id}/Status?language=auto |
SubtitleStatus |
Checks for .generated.srt on disk |
GET |
/Plugins/WhisperSubs/Items/{id}/AudioLanguages |
string[] |
FFprobe-detected languages |
GET |
/Plugins/WhisperSubs/Queue |
{isProcessing, currentItem, remaining, processed} |
Live queue status |
GET |
/Plugins/WhisperSubs/Models |
ModelInfo[] |
.bin files in the model directory |
POST |
/Plugins/WhisperSubs/RunTask |
200 | Triggers the scheduled task |
dotnet build --configuration Release
# Output: bin/Release/net9.0/WhisperSubs.dllCopy the DLL to the Jellyfin plugin directory and restart:
cp bin/Release/net9.0/WhisperSubs.dll \
/path/to/jellyfin/config/plugins/WhisperSubs_<version>/WhisperSubs.dll
# Restart JellyfinThe GitHub Actions workflow (.github/workflows/build-release.yml) triggers on push to main:
- Builds the DLL
- Packages it into a versioned ZIP
- Creates a GitHub Release
- Updates
manifest.jsonwith the checksum - Deploys to GitHub Pages (serves the plugin repository manifest)
Version is read from <Version> in WhisperSubs.csproj. Bump it there before pushing.
Note: The manifest.json in the source tree is NOT authoritative — CI generates a fresh one with the correct version, checksum, and sourceUrl and deploys it to GitHub Pages. The checked-in copy is stale and only exists for reference.
Web/configPage.html is embedded as a resource (EmbeddedResourcePath in Plugin.cs).
- Jellyfin custom elements — Dropdowns with static options (Subtitle Provider, Default Language) use
is="emby-select"for native Jellyfin styling. Dropdowns populated dynamically via JS (Detected Models, Library selector) also useis="emby-select"— the options are added after thepageshowevent fires via API calls. data-require— The page declaresdata-require="emby-input,emby-button,emby-select,emby-checkbox"to ensure Jellyfin loads these components before rendering.- No framework — Pure vanilla JS. The
WhisperSubsConfigobject namespace holds all logic. - Auth — API calls use
ApiClient.accessToken()via thegetAuthHeader()helper. - Config load/save — Uses
ApiClient.getPluginConfiguration()/ApiClient.updatePluginConfiguration()with the plugin GUID.
Open the browser console and look for lines prefixed with WhisperSubs:. All ajaxGet calls log the URL, response status, and parsed data.
WhisperProvider.FindWhisperExecutable() tries candidates in order:
- The configured
WhisperBinaryPath(if set) whisper-cli(PATH)main(PATH)whisper(PATH)
Each candidate is tested with --help. The first one that exits with code 0 or 1 is used.
The whisper-cli binary must be built for the same environment as the Jellyfin container. Jellyfin 10.11.x uses Debian Trixie/Sid. Building on the host and mounting won't work if glibc versions differ.
Build inside the running container or a matching Docker image.
# CPU-only build (any Debian):
apt-get install -y git cmake g++ make
git clone --depth 1 --branch v1.8.4 https://github.com/ggml-org/whisper.cpp.git /tmp/whisper
cd /tmp/whisper
cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
# Binary: build/bin/whisper-cli# Vulkan (GPU) build — requires glslc SPIR-V compiler:
apt-get install -y git cmake g++ make pkg-config libvulkan-dev glslc
# On Debian Bookworm: `glslc` is in the `shaderc` package — install `shaderc` if `glslc` is not found
# On Debian Trixie: `glslc` package exists directly
git clone --depth 1 --branch v1.8.4 https://github.com/ggml-org/whisper.cpp.git /tmp/whisper
cd /tmp/whisper
cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DGGML_VULKAN=ON
cmake --build build --config Release -j$(nproc)
# Binary: build/bin/whisper-cli
# Verify: ldd build/bin/whisper-cli | grep vulkanKey flags:
-DBUILD_SHARED_LIBS=OFF— Static link whisper/ggml libraries into the binary. Without this, you getlibwhisper.so.1: cannot open shared object fileat runtime.-DGGML_VULKAN=ON— Intel/AMD GPU acceleration via Vulkan. Requireslibvulkan-devandglslc(SPIR-V compiler) at build time,libvulkan1andmesa-vulkan-drivers(orintel-media-va-driver) at runtime.-DGGML_CUDA=ON— NVIDIA GPU acceleration. Requires CUDA toolkit.- Common build failure:
Could NOT find Vulkan (missing: glslc)— theglslang-tools/glslang-devpackages do NOT provideglslc. You need theglslcorshadercpackage specifically.
The whisper binary and models MUST be on persistent storage that survives reboots. Do NOT use tmpfs paths like /opt on diskless systems (e.g., Unraid where /opt is on the root RAM disk).
Store in an appdata directory and bind-mount into the container:
volumes:
- /path/to/persistent/whisper:/opt/whisper:rodevices:
- /dev/dri # Intel/AMD GPU render nodes
environment:
- VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/intel_icd.json # Required for Vulkan in containersThe VK_ICD_FILENAMES env var is critical — without it, the Vulkan loader may fail to find the Intel ICD inside the container even with mesa-vulkan-drivers installed. Set it to:
- Intel:
/usr/share/vulkan/icd.d/intel_icd.json - AMD:
/usr/share/vulkan/icd.d/radeon_icd.json
The GPU wrapper script (whisper-cli-gpu) is self-healing: it checks for the Vulkan ICD file on each invocation and runs apt-get install if missing. This survives container recreates without requiring entrypoint modifications. The one-time install adds ~10s to the first transcription after a fresh container.
Verify GPU detection:
docker exec jellyfin /opt/whisper/whisper-cli \
-m /opt/whisper/models/ggml-base.bin -f /dev/null 2>&1 | grep -i vulkan
# Should show: "ggml_vulkan: Found N Vulkan devices"
# And: "whisper_backend_init_gpu: using Vulkan0 backend"
# If it says "no GPU found", check VK_ICD_FILENAMESTested with a 2h15m film (8107s audio), large-v3 model, 5-beam search.
| Config | Wall time | Real-time factor | CPU usage |
|---|---|---|---|
| CPU, 4 threads (default) | ~7h+ (est.) | ~3.2x | ~400% |
| CPU, 16 threads (i5-14500) | 1h48m | 0.80x | ~1270% |
Per-segment breakdown (16 threads):
- Encode: 13,010ms per 30s segment (278 segments) — 56% of total time
- Batch decode: 23ms per run — fast
- Total: 6,477,485ms
GPU offloading is critical — the encode step dominates and is highly parallelizable on GPU. With Vulkan on Intel UHD 770, expect 2-4x overall speedup for full transcription.
GPU disabled for language detection (by design): DetectLanguageAsync passes --no-gpu because each call spawns a fresh whisper process per chunk, and GPU init overhead (model load + shader compilation) exceeds the detection work itself (~21s/chunk with GPU vs ~15s/chunk CPU-only). Transcription still uses GPU where available. The deeper issue — per-chunk process spawning — remains; long-term fix is a persistent whisper-server process that stays loaded (see GitHub issue).
- Hallucination on non-speech audio: During music, credits, or silence, large-v3 generates nonsense (e.g., "Suscríbete al canal!"). The
--suppress-non-speech(-sns) flag helps but doesn't eliminate it. - Language detection false positives: At
probability >= 0.3, concert/music audio can be misidentified as foreign language (e.g., Aerosmith concert detected as Japanese with p=0.316). Consider raising the threshold for non-dialogue content. - Hallucination signatures: Common in Spanish: "La Iglesia de Jesucristo de los Santos de los Últimos Días", "Suscríbete al canal", "Subtítulos por". These appear in credits and silent segments.
Output files follow the pattern:
<media_filename>.<lang>.generated.srt
Examples:
Movie.es.generated.srtShow S01E01.en.generated.srt
The .generated.srt suffix distinguishes AI-generated subtitles from manually added ones. Jellyfin auto-discovers these files when placed alongside the media.
The model path in the plugin config doesn't match the actual file location inside the container. Check the bind-mount and verify the path exists inside the container:
docker exec jellyfin ls -lh /opt/whisper/models/The binary isn't in PATH and the configured path is wrong or the binary crashes on --help. Test it manually:
docker exec jellyfin /opt/whisper/whisper-cli --helpThe binary was built with shared libraries. Rebuild with -DBUILD_SHARED_LIBS=OFF.
Set VK_ICD_FILENAMES environment variable in the container. See GPU passthrough above.
The queue persists to disk, so pending items are restored on Jellyfin restart. If items still appear missing, the scheduled task will re-scan and pick them up automatically.
If not using GPU acceleration, whisper.cpp uses all available CPU cores. Consider:
- Building with Vulkan/CUDA support to offload to GPU
- Using a smaller model (
ggml-base.binorggml-large-v3-turbo.bin) - Scheduling transcription during off-peak hours via the scheduled task settings
If dynamically populated dropdowns appear empty, check the browser console for WhisperSubs: log lines. The API calls may be failing due to auth issues. Hard-refresh the page (Ctrl+Shift+R).
If transcription is cancelled or Jellyfin restarts mid-processing:
- WhisperProvider kills the whisper process and returns whatever partial SRT content was written to disk.
- SubtitleManager saves the partial SRT as
<filename>.<lang>.generated.srt. - On the next run,
SubtitleManager.GenerateSubtitleAsync()detects the existing file, parses the last timestamp viaWhisperProvider.ParseLastSrtTimestamp(), and compares it against the media duration (via FFprobe). - If the SRT is within 30 seconds of the media end, it's considered complete and skipped.
- If partial, FFmpeg extracts audio starting from the resume offset (
-ss), whisper transcribes the remainder, and the new SRT entries are offset-adjusted and appended to the existing file.
Key helpers in WhisperProvider:
ParseLastSrtTimestamp(srtContent)— returns last end timestamp in secondsOffsetSrt(srtContent, offsetSeconds, startIndex)— shifts all timestamps and renumbers entriesCountSrtEntries(srtContent)— counts-->lines
dotnet build --configuration Release
scp bin/Release/net9.0/WhisperSubs.dll \
<host>:/path/to/jellyfin/config/plugins/WhisperSubs_<version>/WhisperSubs.dll
# Restart Jellyfin to load the new DLLThe host path for /config depends on the Docker volume mapping. Find it with:
docker inspect jellyfin --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{println}}{{end}}'- NEVER restart Jellyfin without asking the user first. Jellyfin restarts interrupt active playback and kill the in-memory transcription queue. Always confirm before running
docker restart jellyfin. - Unraid tmpfs: Do NOT store whisper binaries or models in
/opton Unraid — it's a RAM disk that wipes on reboot. Use/mnt/user/appdata/whisperand bind-mount into the container. - Static linking is mandatory: Always build whisper.cpp with
-DBUILD_SHARED_LIBS=OFF. Dynamic builds fail withlibwhisper.so.1: cannot open shared object fileinside the Jellyfin container. - Orphaned docker-proxy: If Jellyfin crashes, the docker-proxy process may hold port 8096. On Unraid, run
rc.docker restartto clean up. On other systems, restart the Docker daemon. - Memory limits: Transcription (especially with large models) can consume 5-10 GB RAM. Set
mem_limitin docker-compose to prevent OOM kills affecting other services. - Plugin directory moves on version change: Jellyfin may rename the plugin folder (e.g.
WhisperSubs_1.0.4.2→WhisperSubs). Always check the actual path withdocker exec jellyfin find /config/plugins -name "WhisperSubs*" -type dbefore deploying.
- Queue persists to disk as
queue.jsonin the plugin data folder (/config/data/WhisperSubs/queue.json). Updated on every enqueue/dequeue. On startup,RestoreQueue()reloads all entries and drains them before the library scan begins. - Global
TranscriptionLock(SemaphoreSlim(1,1)) prevents concurrent whisper processes. Both the drain loop and the scheduled task must acquire it. Without this, two whisper processes run simultaneously and can OOM the container (11.4 GB / 12 GB observed). - Per-language error isolation: If whisper fails on one language (e.g.
en), the error is caught and logged but does not abort remaining languages (e.g.esSRT is still saved). OnlyOperationCanceledExceptionpropagates up. - whisper.cpp writes SRT only at completion — not incrementally. Mid-process kills produce no partial file. The resume feature only helps when whisper finishes writing a file that covers part of the media (rare edge case).
- Killed items are not auto-retried — they fall out of the queue. The scheduled task's library scan will eventually re-process them. Manually re-queue if urgent.
- FFprobe extracts
languagetags from audio streams. Most HDO/WEB-DL files have properspa/engtags. - Normalization: 30+ ISO 639-2 → 639-1 mappings in
SubtitleManager.NormalizeLanguageCode(). - Dedup: if a file has 4 audio streams (
spa, spa, eng, eng— e.g. DD+ and DD variants), onlyesandenare generated. - Fallback: files with no language tags (older rips, some PlutoTV content) get whisper auto-detection — one SRT with language
auto.
- The
.csprojtargetsnet9.0and referencesJellyfin.ModelandJellyfin.Controller10.11.8. - The config page HTML is an embedded resource — changes require rebuilding the DLL.
Plugin.Instanceis a static singleton set in the constructor. All components access config viaPlugin.Instance.Configuration.- The
ISubtitleProviderinterface is designed for extensibility (Parakeet, custom commands), but onlyWhisperProvideris currently implemented. - Language normalization covers 30 ISO 639-2 → 639-1 mappings. Add new ones to
SubtitleManager.NormalizeLanguageCode(). - The Generate endpoint returns HTTP 202 immediately — transcription runs in a background queue. Manual requests get priority over scheduled-task items.
- The config page UI uses Jellyfin's
emby-*custom elements. Dynamic dropdowns (models, libraries) must useis="emby-select"and populate options only after thepageshowevent fires. Do not callloadLibraries()twice — it causes a race condition that wipes the dropdown.
Generated by LynxPrompt CLI