Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions samples/js/image_generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Text to Image JavaScript Generation Pipeline

This example showcases inference of text-to-image diffusion models like Stable Diffusion 1.5, 2.1, FLUX, and LCM. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample features `Text2ImagePipeline` from `openvino-genai-node` and uses a text prompt as input source.

Sample files:
- [`text2image.js`](./text2image.js) demonstrates basic usage of the text-to-image pipeline with a step callback and saves the result as a BMP file using `bmp-js`

Users can change the sample code and play with the following generation parameters:

- Change width or height of generated image
- Generate multiple images per prompt (`num_images_per_prompt`)
- Adjust a number of inference steps (`num_inference_steps`)
- Play with [guidance scale](https://huggingface.co/spaces/stabilityai/stable-diffusion/discussions/9) (read [more details](https://arxiv.org/abs/2207.12598))
- (SD 1.x, 2.x; SD3, SDXL) Add negative prompt when guidance scale > 1
- (SDXL, SD3, FLUX) Specify other positive prompts like `prompt_2`
- Add a per-step callback to monitor progress or stop generation early

## Download and convert the model

The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version.

It's not required to install [../../export-requirements.txt](../../export-requirements.txt) for deployment if the model has already been exported.

```sh
pip install --upgrade-strategy eager -r ../../export-requirements.txt
```

Then, run the export with Optimum CLI:

```sh
optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 dreamlike_anime_1_0_ov/FP16
```

## Run

From the `samples/js` directory, install dependencies (if not already done):

```bash
npm install
```

If you use the master branch, you may need to [build openvino-genai-node from source](../../../src/js/README.md#build-bindings) first.

Run the sample:

```bash
node image_generation/text2image.js dreamlike_anime_1_0_ov/FP16 "cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting"
```

The result is saved as `image.bmp` in the current directory.

### Optional: change device

The device is hardcoded to `CPU` in the sample. To use `GPU`, edit `text2image.js` and change:

```js
const device = "CPU"; // GPU can be used as well
```

### Examples

Prompt: `cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting`

![](./../../cpp/image_generation/512x512.bmp)

Refer to the [Supported Models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/#image-generation-models) for the list of supported models.

## Run with a step callback

The sample registers a callback that prints progress to stdout on each denoising step. The callback can also stop generation early by returning `true`:

```js
function callback(step, numSteps) {
process.stdout.write(`Step ${step + 1}/${numSteps}\r`);
return false; // return true to stop early
}

const imageTensor = await pipeline.generate(prompt, {
width: 512,
height: 512,
num_inference_steps: 20,
callback,
});
```
80 changes: 80 additions & 0 deletions samples/js/image_generation/text2image.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
// Copyright (C) 2023-2026 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

import { writeFile } from "node:fs/promises";
import { basename } from "node:path";
import bmp from "bmp-js";
import yargs from "yargs/yargs";
import { hideBin } from "yargs/helpers";
import { Text2ImagePipeline } from "openvino-genai-node";

function toABGR(tensor) {
const [_, height, width, channels] = tensor.getShape();
if (channels !== 3) {
throw new Error(`Expected RGB image tensor, got ${channels} channels.`);
}

const rgb = tensor.data instanceof Uint8Array ? tensor.data : Uint8Array.from(tensor.data);
Comment thread
Retribution98 marked this conversation as resolved.
const rgba = Buffer.allocUnsafe(width * height * 4);

for (let src = 0, dst = 0; src < rgb.length; src += 3, dst += 4) {
rgba[dst] = 255; // A
rgba[dst + 1] = rgb[src + 2]; // B
rgba[dst + 2] = rgb[src + 1]; // G
rgba[dst + 3] = rgb[src]; // R
}

return { height, width, rgba };
}

async function main() {
const argv = await yargs(hideBin(process.argv))
.scriptName(basename(process.argv[1]))
.command(
"$0 <model_dir> <prompt>",
"Run Text2Image pipeline and save generated image as BMP file",
(yargsBuilder) =>
yargsBuilder
.positional("model_dir", {
type: "string",
describe: "Path to the converted image generation model directory",
demandOption: true,
})
.positional("prompt", {
type: "string",
describe: "Prompt to generate images from",
demandOption: true,
})
)
.strict()
.help()
.parse();

const device = "CPU"; // GPU can be used as well
const pipeline = await Text2ImagePipeline(argv.model_dir, device);

function callback(step, numSteps) {
process.stdout.write(`Step ${step + 1}/${numSteps}\r`);
return false;
}

const imageTensor = await pipeline.generate(
argv.prompt,
{
width: 512,
height: 512,
num_inference_steps: 20,
num_images_per_prompt: 1,
callback,
},
);

const { height, width, rgba } = toABGR(imageTensor);
const bmpData = bmp.encode({ width, height, data: rgba });
await writeFile("image.bmp", bmpData.data);
}

main().catch((error) => {
console.error(error);
process.exitCode = 1;
});
8 changes: 8 additions & 0 deletions samples/js/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion samples/js/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@
"license": "Apache-2.0",
"type": "module",
"devDependencies": {
"bmp-js": "^0.1.0",
"node-wav": "^0.0.2",
"openvino-node": "^2026.1.0",
"openvino-genai-node": "^2026.1.0",
"openvino-node": "^2026.1.0",
"yargs": "^18.0.0",
"zod": "^4.1.13"
},
Expand Down
6 changes: 5 additions & 1 deletion site/docs/bindings/node-js.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: Node.js bindings provide JavaScript/TypeScript API.
OpenVINO GenAI provides Node.js bindings that enable you to use generative AI pipelines in JavaScript and TypeScript applications.

:::warning API Coverage
Node.js bindings currently provide a subset of the full OpenVINO GenAI API available in C++ and Python. The focus is on core text generation (`LLMPipeline`), vision language models (`VLMPipeline`), text embedding (`TextEmbeddingPipeline`), text reranking (`TextRerankPipeline`), speech recognition (`WhisperPipeline`), and speech generation (`Text2SpeechPipeline`) functionality.
Node.js bindings currently provide a subset of the full OpenVINO GenAI API available in C++ and Python. The focus is on core text generation (`LLMPipeline`), vision language models (`VLMPipeline`), text embedding (`TextEmbeddingPipeline`), text reranking (`TextRerankPipeline`), speech recognition (`WhisperPipeline`), speech generation (`Text2SpeechPipeline`), and image generation (`Text2ImagePipeline`) functionality.
:::

## Supported Pipelines and Features
Expand All @@ -36,6 +36,10 @@ Node.js bindings currently support:
- `Text2SpeechPipeline`: Speech generation from text
- Optional speaker embedding support
- Batch generation
- `Text2ImagePipeline`: Image generation from text prompts using diffusion models
- Configurable width, height, inference steps, and guidance scale
- Batch image generation via `num_images_per_prompt`
- Callback support for monitoring or stopping generation early
- `Tokenizer`: Fast tokenization / detokenization and chat prompt formatting
- Encode strings into token id and attention mask tensors
- Decode token sequences
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import CodeBlock from '@theme/CodeBlock';

<CodeBlock language="javascript" showLineNumbers>
{`import { writeFile } from 'node:fs/promises';
import bmp from 'bmp-js';
import { Text2ImagePipeline } from 'openvino-genai-node';

const pipeline = await Text2ImagePipeline(modelPath, '${props.device || 'CPU'}');
const image = await pipeline.generate(prompt);

// Save the first generated image as BMP
const [height, width] = [image.getShape()[1], image.getShape()[2]];
const rgb = Uint8Array.from(image.data);
const abgr = Buffer.allocUnsafe(width * height * 4);

for (let src = 0, dst = 0; src < rgb.length; src += 3, dst += 4) {
abgr[dst] = 255; // A
abgr[dst + 1] = rgb[src + 2]; // B
abgr[dst + 2] = rgb[src + 1]; // G
abgr[dst + 3] = rgb[src]; // R
}

const bmpData = bmp.encode({ width, height, data: abgr });
await writeFile('image.bmp', bmpData.data);
`}
</CodeBlock>
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import Text2ImageCPP from './_text2image_cpp.mdx';
import Text2ImagePython from './_text2image_python.mdx';
import Text2ImageJS from './_text2image_js.mdx';

import Image2ImageCPP from './_image2image_cpp.mdx';
import Image2ImagePython from './_image2image_python.mdx';
Expand Down Expand Up @@ -44,6 +45,16 @@ OpenVINO GenAI supports the following diffusion model pipelines:
</TabItem>
</Tabs>
</TabItemCpp>
<TabItemJS>
<Tabs groupId="device">
<TabItem label="CPU" value="cpu">
<Text2ImageJS device="CPU" />
</TabItem>
<TabItem label="GPU" value="gpu">
<Text2ImageJS device="GPU" />
</TabItem>
</Tabs>
</TabItemJS>
</LanguageTabs>

:::tip
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import GenerationConfigurationWorkflow from '@site/docs/use-cases/_shared/_gener
## Additional Usage Options

:::tip
Check out [Python](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/image_generation) and [C++](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/image_generation) image generation samples.
Check out [Python](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/image_generation), [C++](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/image_generation), and [JavaScript](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/js/image_generation) image generation samples.
:::

### Use Different Generation Parameters
Expand Down Expand Up @@ -60,6 +60,22 @@ You can adjust several parameters to control the image generation process, inclu
}
```
</TabItemCpp>
<TabItemJS>
```javascript
import { Text2ImagePipeline } from 'openvino-genai-node';

const pipeline = await Text2ImagePipeline(modelPath, 'CPU');
const result = await pipeline.generate(prompt, {
// highlight-start
width: 512,
height: 512,
num_images_per_prompt: 1,
num_inference_steps: 30,
guidance_scale: 7.5,
// highlight-end
});
```
</TabItemJS>
</LanguageTabs>

:::info Understanding Image Generation Parameters
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import Button from '@site/src/components/Button';
import { LanguageTabs, TabItemCpp, TabItemPython } from '@site/src/components/LanguageTabs';
import { LanguageTabs, TabItemCpp, TabItemPython, TabItemJS } from '@site/src/components/LanguageTabs';

import UseCaseCard from './UseCaseCard';

import CodeExampleCpp from '@site/docs/use-cases/image-generation/_sections/_run_model/_text2image_cpp.mdx';
import CodeExamplePython from '@site/docs/use-cases/image-generation/_sections/_run_model/_text2image_python.mdx';
import CodeExampleJS from '@site/docs/use-cases/image-generation/_sections/_run_model/_text2image_js.mdx';

export const ImageGeneration = () => (
<UseCaseCard>
Expand All @@ -30,6 +31,9 @@ export const ImageGeneration = () => (
<TabItemCpp>
<CodeExampleCpp />
</TabItemCpp>
<TabItemJS>
<CodeExampleJS />
</TabItemJS>
</LanguageTabs>
</UseCaseCard.Code>
<UseCaseCard.Actions>
Expand Down
3 changes: 3 additions & 0 deletions src/js/eslint.config.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ module.exports = defineConfig([
"begin_suppress_tokens",
"suppress_tokens",
"initial_prompt",
"num_inference_steps",
"num_images_per_prompt",
"guidance_scale",
],
},
],
Expand Down
2 changes: 2 additions & 0 deletions src/js/include/addon.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@ struct AddonData {
Napi::FunctionReference vlm_pipeline;
Napi::FunctionReference text_rerank_pipeline;
Napi::FunctionReference whisper_pipeline;
Napi::FunctionReference text2image_pipeline;
Napi::FunctionReference tokenizer;
Napi::FunctionReference perf_metrics;
Napi::FunctionReference vlm_perf_metrics;
Napi::FunctionReference whisper_perf_metrics;
Napi::FunctionReference text2speech_pipeline;
Napi::FunctionReference text2speech_perf_metrics;
Napi::FunctionReference text2image_perf_metrics;
Napi::FunctionReference chat_history;
Napi::FunctionReference reasoning_parser;
Napi::FunctionReference deepseek_r1_reasoning_parser;
Expand Down
Loading
Loading