Properly handle per request seed for multinomial sampling by mzegla · Pull Request #3721 · openvinotoolkit/openvino.genai

mzegla · 2026-04-16T09:55:11Z

Accept per request seed for multinomial sampling instead of initializing default seed on pipeline initialization (continuous batching).
Addressing: #3584

Copilot

Pull request overview

This PR fixes RNG seeding for multinomial sampling by moving from a single sampler-level RNG seed to per-request RNG engines (important for continuous batching), so GenerationConfig.rng_seed is honored per request and reproducibility works as expected.

Changes:

Refactor C++ sampler to keep per-request sampling state (including std::mt19937) in a request context map and pass RNG explicitly to multinomial sampling/validation paths.
Stop seeding the sampler globally during continuous batching pipeline initialization.
Add Python tests covering same-seed reproducibility and different-seed variability; add RNG seed CLI support to multinomial text-generation samples.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/python_tests/test_llm_pipeline.py	Adds RNG seed reproducibility/difference tests across pipeline types.
tests/python_tests/test_continuous_batching.py	Adds continuous batching add_request/step tests for RNG seed behavior.
src/cpp/src/sampling/sampler.hpp	Introduces per-request sampler context (RNG/logit processor/stop strings) and updates method signatures.
src/cpp/src/sampling/sampler.cpp	Uses per-request contexts during sampling; routes RNG through multinomial sampling and validation.
src/cpp/src/continuous_batching/pipeline_impl.cpp	Removes global sampler seeding at CB pipeline init.
samples/python/text_generation/multinomial_causal_lm.py	Adds optional `--rng_seed` argument and validation.
samples/cpp/text_generation/multinomial_causal_lm.cpp	Adds optional RNG seed CLI arg parsing and sets `config.rng_seed`.

as-suvorov · 2026-04-16T10:38:30Z

 int main(int argc, char* argv[]) try {
-    if (3 != argc) {
-        throw std::runtime_error(std::string{"Usage: "} + argv[0] + " <MODEL_DIR> '<PROMPT>'");
+    if (argc < 3 || argc > 4) {


We do not expose other config properties as arguments. Let's revert seed argument and keep samples simple

The expectation is that with multinomial algorithm, responses should differ slightly each time. I guess @mzegla is trying to showcase the difference that way but the correct behavior should be to have random default seed.
If we have fixed static seed, the point of multinomial is not met.
I propose to drop this parameter from sample but set random seed by default.

Makes sense to me. @as-suvorov , @apaniukov - what do you think?

@dtrawins Do you mean to set random seed to GenerationConfig?

@as-suvorov yes, the seed should be random by default. So it would be the best to set random value in the constructor by default. That would achieve correct behavior of multinomial algorithm and give option to get deterministic responses with a static seed value in the user request.
Potentially that might impact unit tests but this is needed. Open question is if that should be done in the same PR or separate. I think it could be one as it states in title "Properly handle per request seed for multinomial sampling" ;-)

llama.cpp and vllm use non-deterministic generation by default and getting deterministic sequence requires setting the seed explicitly
How shall we proceed?

It's fine by me. @Wovchena @apaniukov Do you have any objections?

Is transformers and llama.cpp alignment the only reason? A counter reason would be not to suddenly change GenAI's default people could already get used to.

@likholat, can you describe the behavior of random generators for image generation?

From my perspective it's more of a misconfiguration that is now being fixed.
I would say that general expectation when using random sampling is to get different outputs for exactly the same prompt by default (#3584) and deterministic behavior only on-demand when setting specific seed.

Pushed few commits with default random seed changed. Let me know what the final decision is.
If there are objections I will revert it and only keep the fix for correct seed usage.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-16T13:41:47Z

+# Speculative decoding and prompt lookup require extra GenerationConfig params;
+# exclude them from generic RNG-seed tests.
+_NON_ASSISTANT_PIPELINE_TYPES = tuple(
+    pt
+    for pt in ALL_PIPELINE_TYPES
+    if pt not in (PipelineType.SPECULATIVE_DECODING, PipelineType.PROMPT_LOOKUP_DECODING)
+)


The PR description doesn’t appear to follow the repository PR template (missing the checklist items like tests/docs/ticket coverage). Please update the PR description to include the template sections and mark the checklist items appropriately so reviewers can verify completion (see .github/pull_request_template.md).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-20T15:02:23Z

+        RequestSamplerContext(size_t seed, LogitProcessor&& lp)
+            : rng_engine(seed), logit_processor(std::move(lp)) {}


RequestSamplerContext seeds std::mt19937 with a size_t. std::mt19937 only uses a 32-bit seed, so high bits of rng_seed are silently truncated (e.g., seeds that differ by 2^32 become identical), which can surprise users and reduce entropy for RANDOM_SEED-derived values. Consider seeding via std::seed_seq (splitting the size_t into 32-bit parts) or switching to std::mt19937_64 if you want rng_seed to be fully respected.

Copilot · 2026-04-20T15:02:23Z

+        # Cross-comparison is skipped: without a fixed rng_seed each run produces
+        # non-deterministic output, so C++, Python, and JS results will differ.


This test now runs the C++/Python/JS multinomial samples but no longer asserts anything about their outputs, which significantly reduces the value of the sample test. Since deterministic behavior is still supported via GenerationConfig.rng_seed, consider setting a fixed rng_seed in the samples (or otherwise controlling it in the test) and restoring cross-comparison (or at least asserting basic expectations like non-empty output) so the test continues to catch regressions.

Copilot · 2026-04-20T15:02:24Z

+    // std::random_device::entropy() == 0 means it is a PRNG, not a true RNG.
+    static const bool is_rd_prng = std::random_device{}.entropy() == 0;
+    if (is_rd_prng) {
+        return static_cast<size_t>(std::chrono::steady_clock::now().time_since_epoch().count());
+    }


resolve_rng_seed(): when std::random_device::entropy()==0, the fallback seeds from steady_clock ticks, which can collide for multiple requests created in the same tick (e.g., batched prompts / continuous batching), undermining the promise of a distinct non-deterministic seed per request. Consider mixing in an atomic counter / request_id (or still consuming std::random_device() even if entropy()==0) and/or switching to system_clock as stated in the comment so seeds are less likely to repeat across runs.

properly handle per request seed for multinomial sampling

ad27eb9

mzegla requested review from apaniukov and sgonorov as code owners April 16, 2026 09:55

Copilot AI review requested due to automatic review settings April 16, 2026 09:55

mzegla requested review from Wovchena, as-suvorov, pavel-esir and popovaan as code owners April 16, 2026 09:55

github-actions Bot added category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms category: LLM samples GenAI LLM samples category: GGUF GGUF file reader labels Apr 16, 2026

Copilot started reviewing on behalf of mzegla April 16, 2026 09:55 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

fixes

99e4f01

apaniukov approved these changes Apr 16, 2026

View reviewed changes

as-suvorov reviewed Apr 16, 2026

View reviewed changes

revert samples change

df1cf54

Copilot AI review requested due to automatic review settings April 16, 2026 11:41

Copilot started reviewing on behalf of mzegla April 16, 2026 11:42 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread src/cpp/src/sampling/sampler.cpp Outdated

Comment thread src/cpp/src/sampling/sampler.cpp Outdated

mzegla added 2 commits April 16, 2026 14:44

address copilot

fa061fc

python test fixes

7486311

Copilot AI review requested due to automatic review settings April 16, 2026 13:35

Copilot started reviewing on behalf of mzegla April 16, 2026 13:36 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Merge branch 'master' into random_seed_fix

144a2ba

as-suvorov assigned apaniukov Apr 20, 2026

Merge branch 'master' into random_seed_fix

32ccaa6

Copilot AI review requested due to automatic review settings April 20, 2026 08:53

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Comment thread src/cpp/src/sampling/sampler.hpp Outdated

Comment thread src/cpp/src/sampling/sampler.hpp Outdated

Comment thread src/cpp/src/sampling/sampler.cpp

Comment thread tests/python_tests/test_llm_pipeline.py

Copilot started reviewing on behalf of mzegla April 20, 2026 09:13 View session

mzegla added 3 commits April 20, 2026 11:28

random seed by default

2caa4c6

remove set_seed/get_seed

360280e

adjust samples and tests

c0e9dd6

mzegla requested review from Retribution98 and yatarkan as code owners April 20, 2026 14:42

github-actions Bot added category: visual language Visual language pipeline category: whisper Whisper pipeline category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: Structured Output samples labels Apr 20, 2026

Merge branch 'master' into random_seed_fix

486edbd

Copilot AI review requested due to automatic review settings April 20, 2026 14:55

Copilot started reviewing on behalf of mzegla April 20, 2026 14:55 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

wheel build fix

9e181fd

dtrawins approved these changes Apr 21, 2026

View reviewed changes

		RequestSamplerContext(size_t seed, LogitProcessor&& lp)
		: rng_engine(seed), logit_processor(std::move(lp)) {}

		# Cross-comparison is skipped: without a fixed rng_seed each run produces
		# non-deterministic output, so C++, Python, and JS results will differ.

Conversation

mzegla commented Apr 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants