This package is useful in a node.js environment (node, Electron, etc.) where it will provide access to the C++ implementation of whisper.cpp which should be the fastest possible way to run it.
To install you need to have cmake v4+ (and Visual Studio on Windows) already available on you machine - it will be used to build the addon for your environment.
npm install whisper.cpp.node
npx whisper.cpp.node installThis will download the whisper.cpp repository and will build the addon. There are few commands that can be used:
- "install" - downloads the latest released tag of the whisper.cpp repository at the time of publishing or does nothing if the repo is already downloaded;
- "install latest" - downloads the latest master;
- "reinstall" - downloads the latest released tag of the whisper.cpp repository at the time of publishing even if the repo is already downloaded;
- "reinstall latest" - will produce the same result as "install latest";
- "rebuild" - will not download anything but will simply try to rebuild the addon (if for example you change the version of node).
The addon will then be available as the package's main export for use like:
const whisper = require("whisper.cpp.node");
const transcription = await whisper({
language: 'en',
model: './models/ggml-base.en.bin',
fname_inp: './your/file/here'
});
console.log(transcription);Check the Supported Parameters section for more parameter information.
It will not download any models for inference. As noted in many other packages, there are models ready for download and use in the Hugging Face repo of whisper.cpp.
It will also not work if you try to bundle it for the browser (you should use the whisper.cpp package instead which provides the WASM version).
And following is the original README of the addon where you can see details for using it.
This is an addon demo that can perform whisper model reasoning in node and electron environments, based on cmake-js.
It can be used as a reference for using the whisper.cpp project in other node projects.
This addon now supports Voice Activity Detection (VAD) for improved transcription performance.
npm installMake sure it is in the project root directory and compiled with make-js.
npx cmake-js compile -T addon.node -B ReleaseFor Electron addon and cmake-js options, you can see cmake-js and make very few configuration changes.
Such as appointing special cmake path:
npx cmake-js compile -c 'xxx/cmake' -T addon.node -B Release
cd examples/addon.node
node index.js --language='language' --model='model-path' --fname_inp='file-path'Run the VAD example with performance comparison:
node vad-example.jsVAD can significantly improve transcription performance by only processing speech segments, which is especially beneficial for audio files with long periods of silence.
Before using VAD, download a VAD model:
# From the whisper.cpp root directory
./models/download-vad-model.sh silero-v6.2.0All VAD parameters are optional and have sensible defaults:
vad: Enable VAD (default: false)vad_model: Path to VAD model file (required when VAD enabled)vad_threshold: Speech detection threshold 0.0-1.0 (default: 0.5)vad_min_speech_duration_ms: Min speech duration in ms (default: 250)vad_min_silence_duration_ms: Min silence duration in ms (default: 100)vad_max_speech_duration_s: Max speech duration in seconds (default: FLT_MAX)vad_speech_pad_ms: Speech padding in ms (default: 30)vad_samples_overlap: Sample overlap 0.0-1.0 (default: 0.1)
const path = require("path");
const { whisper } = require(path.join(__dirname, "../../build/Release/addon.node"));
const { promisify } = require("util");
const whisperAsync = promisify(whisper);
// With VAD enabled
const vadParams = {
language: "en",
model: path.join(__dirname, "../../models/ggml-base.en.bin"),
fname_inp: path.join(__dirname, "../../samples/jfk.wav"),
vad: true,
vad_model: path.join(__dirname, "../../models/ggml-silero-v6.2.0.bin"),
vad_threshold: 0.5,
progress_callback: (progress) => console.log(`Progress: ${progress}%`)
};
whisperAsync(vadParams).then(result => console.log(result));Both traditional whisper.cpp parameters and new VAD parameters are supported:
language: Language code (e.g., "en", "es", "fr")model: Path to whisper model filefname_inp: Path to input audio fileuse_gpu: Enable GPU acceleration (default: true)flash_attn: Enable flash attention (default: false)no_prints: Disable console output (default: false)no_timestamps: Disable timestamps (default: false)detect_language: Auto-detect language (default: false)audio_ctx: Audio context size (default: 0)max_len: Maximum segment length (default: 0)max_context: Maximum context size (default: -1)prompt: Initial prompt for decodercomma_in_time: Use comma in timestamps (default: true)print_progress: Print progress info (default: false)progress_callback: Progress callback function- VAD parameters (see above section)