Skip to content

Properly handle per request seed for multinomial sampling#3721

Open
mzegla wants to merge 12 commits intoopenvinotoolkit:masterfrom
mzegla:random_seed_fix
Open

Properly handle per request seed for multinomial sampling#3721
mzegla wants to merge 12 commits intoopenvinotoolkit:masterfrom
mzegla:random_seed_fix

Conversation

@mzegla
Copy link
Copy Markdown
Collaborator

@mzegla mzegla commented Apr 16, 2026

Accept per request seed for multinomial sampling instead of initializing default seed on pipeline initialization (continuous batching).
Addressing: #3584

Copilot AI review requested due to automatic review settings April 16, 2026 09:55
@github-actions github-actions Bot added category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: sampling Sampling / Decoding algorithms category: LLM samples GenAI LLM samples category: GGUF GGUF file reader labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes RNG seeding for multinomial sampling by moving from a single sampler-level RNG seed to per-request RNG engines (important for continuous batching), so GenerationConfig.rng_seed is honored per request and reproducibility works as expected.

Changes:

  • Refactor C++ sampler to keep per-request sampling state (including std::mt19937) in a request context map and pass RNG explicitly to multinomial sampling/validation paths.
  • Stop seeding the sampler globally during continuous batching pipeline initialization.
  • Add Python tests covering same-seed reproducibility and different-seed variability; add RNG seed CLI support to multinomial text-generation samples.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/python_tests/test_llm_pipeline.py Adds RNG seed reproducibility/difference tests across pipeline types.
tests/python_tests/test_continuous_batching.py Adds continuous batching add_request/step tests for RNG seed behavior.
src/cpp/src/sampling/sampler.hpp Introduces per-request sampler context (RNG/logit processor/stop strings) and updates method signatures.
src/cpp/src/sampling/sampler.cpp Uses per-request contexts during sampling; routes RNG through multinomial sampling and validation.
src/cpp/src/continuous_batching/pipeline_impl.cpp Removes global sampler seeding at CB pipeline init.
samples/python/text_generation/multinomial_causal_lm.py Adds optional --rng_seed argument and validation.
samples/cpp/text_generation/multinomial_causal_lm.cpp Adds optional RNG seed CLI arg parsing and sets config.rng_seed.

Comment thread samples/cpp/text_generation/multinomial_causal_lm.cpp Outdated
Comment thread tests/python_tests/test_llm_pipeline.py Outdated
Comment thread tests/python_tests/test_llm_pipeline.py Outdated
Comment thread tests/python_tests/test_continuous_batching.py Outdated
Comment thread tests/python_tests/test_continuous_batching.py Outdated
Comment thread src/cpp/src/sampling/sampler.hpp Outdated
Comment thread src/cpp/src/sampling/sampler.cpp Outdated
int main(int argc, char* argv[]) try {
if (3 != argc) {
throw std::runtime_error(std::string{"Usage: "} + argv[0] + " <MODEL_DIR> '<PROMPT>'");
if (argc < 3 || argc > 4) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not expose other config properties as arguments. Let's revert seed argument and keep samples simple

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is that with multinomial algorithm, responses should differ slightly each time. I guess @mzegla is trying to showcase the difference that way but the correct behavior should be to have random default seed.
If we have fixed static seed, the point of multinomial is not met.
I propose to drop this parameter from sample but set random seed by default.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. @as-suvorov , @apaniukov - what do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtrawins Do you mean to set random seed to GenerationConfig?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@as-suvorov yes, the seed should be random by default. So it would be the best to set random value in the constructor by default. That would achieve correct behavior of multinomial algorithm and give option to get deterministic responses with a static seed value in the user request.
Potentially that might impact unit tests but this is needed. Open question is if that should be done in the same PR or separate. I think it could be one as it states in title "Properly handle per request seed for multinomial sampling" ;-)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llama.cpp and vllm use non-deterministic generation by default and getting deterministic sequence requires setting the seed explicitly
How shall we proceed?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine by me. @Wovchena @apaniukov Do you have any objections?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is transformers and llama.cpp alignment the only reason? A counter reason would be not to suddenly change GenAI's default people could already get used to.

@likholat, can you describe the behavior of random generators for image generation?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective it's more of a misconfiguration that is now being fixed.
I would say that general expectation when using random sampling is to get different outputs for exactly the same prompt by default (#3584) and deterministic behavior only on-demand when setting specific seed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed few commits with default random seed changed. Let me know what the final decision is.
If there are objections I will revert it and only keep the fix for correct seed usage.

Copilot AI review requested due to automatic review settings April 16, 2026 11:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread src/cpp/src/sampling/sampler.cpp Outdated
Comment thread src/cpp/src/sampling/sampler.cpp Outdated
Copilot AI review requested due to automatic review settings April 16, 2026 13:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment on lines +1148 to +1154
# Speculative decoding and prompt lookup require extra GenerationConfig params;
# exclude them from generic RNG-seed tests.
_NON_ASSISTANT_PIPELINE_TYPES = tuple(
pt
for pt in ALL_PIPELINE_TYPES
if pt not in (PipelineType.SPECULATIVE_DECODING, PipelineType.PROMPT_LOOKUP_DECODING)
)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description doesn’t appear to follow the repository PR template (missing the checklist items like tests/docs/ticket coverage). Please update the PR description to include the template sections and mark the checklist items appropriately so reviewers can verify completion (see .github/pull_request_template.md).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 20, 2026 08:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment thread src/cpp/src/sampling/sampler.hpp Outdated
Comment thread src/cpp/src/sampling/sampler.hpp Outdated
Comment thread src/cpp/src/sampling/sampler.cpp
Comment thread tests/python_tests/test_llm_pipeline.py
@github-actions github-actions Bot added category: visual language Visual language pipeline category: whisper Whisper pipeline category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: Structured Output samples labels Apr 20, 2026
Copilot AI review requested due to automatic review settings April 20, 2026 14:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Comment on lines +84 to +85
RequestSamplerContext(size_t seed, LogitProcessor&& lp)
: rng_engine(seed), logit_processor(std::move(lp)) {}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestSamplerContext seeds std::mt19937 with a size_t. std::mt19937 only uses a 32-bit seed, so high bits of rng_seed are silently truncated (e.g., seeds that differ by 2^32 become identical), which can surprise users and reduce entropy for RANDOM_SEED-derived values. Consider seeding via std::seed_seq (splitting the size_t into 32-bit parts) or switching to std::mt19937_64 if you want rng_seed to be fully respected.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +44
# Cross-comparison is skipped: without a fixed rng_seed each run produces
# non-deterministic output, so C++, Python, and JS results will differ.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test now runs the C++/Python/JS multinomial samples but no longer asserts anything about their outputs, which significantly reduces the value of the sample test. Since deterministic behavior is still supported via GenerationConfig.rng_seed, consider setting a fixed rng_seed in the samples (or otherwise controlling it in the test) and restoring cross-comparison (or at least asserting basic expectations like non-empty output) so the test continues to catch regressions.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +19 to +23
// std::random_device::entropy() == 0 means it is a PRNG, not a true RNG.
static const bool is_rd_prng = std::random_device{}.entropy() == 0;
if (is_rd_prng) {
return static_cast<size_t>(std::chrono::steady_clock::now().time_since_epoch().count());
}
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_rng_seed(): when std::random_device::entropy()==0, the fallback seeds from steady_clock ticks, which can collide for multiple requests created in the same tick (e.g., batched prompts / continuous batching), undermining the promise of a distinct non-deterministic seed per request. Consider mixing in an atomic counter / request_id (or still consuming std::random_device() even if entropy()==0) and/or switching to system_clock as stated in the comment so seeds are less likely to repeat across runs.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: LLM samples GenAI LLM samples category: LLM LLM pipeline (stateful, static) category: Python API Python API for GenAI category: sampling Sampling / Decoding algorithms category: Structured Output samples category: visual language Visual language pipeline category: whisper Whisper pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants