Add finish reason by pavel-esir · Pull Request #3670 · openvinotoolkit/openvino.genai

pavel-esir · 2026-04-08T11:57:28Z

Description

Adds stop reason to generation results, adds one more reason TOOL_CALL_STOP and and allows to call stop from parser. With this change, GenAI behaves consistently with OpenAI-compatible API expectations.

CVS-181410

Is connected to openvinotoolkit/model_server#3927

Checklist:

This PR follows GenAI Contributing guidelines.
Tests have been updated or added to cover the new code.
This PR fully addresses the ticket.
I have made corresponding changes to the documentation.

Copilot

Pull request overview

This PR introduces explicit generation stop/finish reasons across C++ and Python APIs (including a new tool-call stop path) to better align GenAI behavior with OpenAI-compatible API expectations.

Changes:

Added GenerationFinishReason plumbing end-to-end (pipelines populate per-sequence finish_reasons; Python bindings/stubs expose them).
Added StreamingStatus::TOOL_CALL_STOP to represent parser-triggered stopping during streaming.
Updated streaming loops to react to TOOL_CALL_STOP and propagate stop semantics.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
tests/python_tests/test_parsers.py	Adds a new incremental parser test scenario (tool-call extraction + reasoning extraction).
src/python/py_streamers.cpp	Exposes `StreamingStatus.TOOL_CALL_STOP` to Python.
src/python/py_openvino_genai.cpp	Exposes `finish_reasons` on `DecodedResults` / `EncodedResults` to Python.
src/python/openvino_genai/py_openvino_genai.pyi	Updates Python stubs for new fields/statuses and `GenerationHandle.stop` signature.
src/cpp/src/whisper/whisper.cpp	Updates Whisper streaming stop handling and populates `finish_reasons`.
src/cpp/src/whisper/pipeline_static.cpp	Updates Whisper streaming stop handling and populates `finish_reasons`.
src/cpp/src/visual_language/pipeline.cpp	Propagates `finish_reasons` from encoded → decoded results.
src/cpp/src/visual_language/continuous_batching_adapter.hpp	Propagates `finish_reasons` through VLM continuous batching adapter.
src/cpp/src/text_streamer.cpp	Adds parser-driven tool-call stop handling in `TextParserStreamer`.
src/cpp/src/speculative_decoding/stateful/stateful_pipeline_base.cpp	Propagates `finish_reasons` into decoded outputs.
src/cpp/src/speculative_decoding/stateful/fast_draft_strategy.cpp	Initializes `finish_reasons` in results.
src/cpp/src/speculative_decoding/stateful/eagle3_strategy.cpp	Initializes `finish_reasons` in results.
src/cpp/src/speculative_decoding/continuous_batching/fast_draft_strategy.hpp	Populates per-sequence `m_finish_reasons` with fallback to stream reason on external stop.
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp	Populates per-sequence `m_finish_reasons` with fallback to stream reason on external stop.
src/cpp/src/lm_encoding.cpp	Propagates `TOOL_CALL_STOP` into `handle->stop(TOOL_CALL)` and collects `finish_reasons`.
src/cpp/src/llm/pipeline_static.cpp	Propagates `TOOL_CALL_STOP` into `handle->stop(TOOL_CALL)` and collects `finish_reasons`.
src/cpp/src/llm/pipeline_stateful.cpp	Propagates `finish_reasons` into decoded outputs.
src/cpp/src/llm/pipeline_continuous_batching_adapter.hpp	Aggregates/moves `finish_reasons` through the adapter.
src/cpp/src/generation_stream.hpp	Stores a finish reason on `GenerationStream::stop(...)`.
src/cpp/src/generation_handle.cpp	Extends `GenerationHandleImpl::stop(...)` to accept a finish reason.
src/cpp/src/continuous_batching/pipeline_impl.cpp	Populates per-sequence `m_finish_reasons` with fallback to stream reason on external stop.
src/cpp/src/continuous_batching/pipeline_base.cpp	Propagates `TOOL_CALL_STOP` into `handle->stop(TOOL_CALL)` and propagates `finish_reasons` through result conversion.
src/cpp/include/openvino/genai/streamer_base.hpp	Adds `StreamingStatus::TOOL_CALL_STOP` to public C++ API.
src/cpp/include/openvino/genai/parsers.hpp	Adds `IncrementalParser::get_status()` to support stop signaling.
src/cpp/include/openvino/genai/llm_pipeline.hpp	Adds `finish_reasons` to `EncodedResults` / `DecodedResults`.
src/cpp/include/openvino/genai/generation_handle.hpp	Adds `GenerationFinishReason::TOOL_CALL` and per-sequence finish reason vectors; extends `stop(...)` API.

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-20T11:52:50Z

        .def_readwrite("m_generation_ids", &EncodedGenerationResult::m_generation_ids)
        .def_readwrite("m_scores", &EncodedGenerationResult::m_scores)
+        .def_readonly("finish_reasons", &EncodedGenerationResult::m_finish_reasons)
        .def_readonly("perf_metrics", &EncodedGenerationResult::perf_metrics)


EncodedGenerationResult exposes the new finish-reason vector under the python attribute name finish_reasons, while the rest of the struct fields are exposed as m_request_id / m_generation_ids / m_scores. This inconsistency makes the API harder to discover and breaks the naming pattern users rely on for these handle result structs. Consider renaming the binding to m_finish_reasons (and updating the .pyi accordingly), or exposing both names as aliases for backward/forward compatibility.

This question lies outside of this PR and should be discussed separately. We already in master have incosistency some fields are exposed with m_ prefix some without. I made finish reasons same as perf_metrics, extended_perf_metrics. We should address this separtely.

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.

Copilot · 2026-04-21T10:54:12Z

+    prompts = [
+        "What is the capital of France? Just answer without explanation.",
+        "Why the Sun is Yellow",
+    ]
+    res = pipe.generate(prompts, max_new_tokens=50)
+
+    assert len(res.texts) == len(prompts)
+    assert res.finish_reasons == [GenerationFinishReason.STOP, GenerationFinishReason.LENGTH]


test_batched_generate_returns_finish_reason_for_each_sequence asserts a specific STOP/LENGTH combination for two prompts, but without controlling EOS/stop conditions this is likely to be non-deterministic across model versions/conversion settings (the second prompt may finish with EOS before hitting max_new_tokens). Make the test deterministic by explicitly configuring stop conditions (e.g., use ignore_eos=True plus a stop string that only the first prompt is expected to emit) or relax the assertion to only validate that finish_reasons has one entry per prompt and values are in the expected set.

Copilot · 2026-04-21T10:54:13Z

@@ -104,6 +120,7 @@
            return result_dicts;
        })
        .def_readonly("perf_metrics", &DecodedResults::perf_metrics)
+        .def_readonly("finish_reasons", &DecodedResults::finish_reasons)
        .def_readonly("extended_perf_metrics", &DecodedResults::extended_perf_metrics)


The DecodedResults pybind docstring still documents only texts/scores/metrics, but the binding now also exposes finish_reasons. Please update decoded_results_docstring so the Python API docs reflect the new field.

Copilot · 2026-04-21T10:54:13Z

    py::class_<EncodedResults>(m, "EncodedResults", encoded_results_docstring)
        .def_readonly("tokens", &EncodedResults::tokens)
        .def_readonly("scores", &EncodedResults::scores)
        .def_readonly("perf_metrics", &EncodedResults::perf_metrics)
+        .def_readonly("finish_reasons", &EncodedResults::finish_reasons)
        .def_readonly("extended_perf_metrics", &EncodedResults::extended_perf_metrics);


The EncodedResults pybind docstring still documents only tokens/scores/metrics, but the binding now also exposes finish_reasons. Please update encoded_results_docstring so the Python API docs reflect the new field.

Copilot · 2026-04-21T10:54:13Z

+template <>
+Napi::Value cpp_to_js<ov::genai::GenerationFinishReason, Napi::Value>(
+    const Napi::Env& env,
+    const ov::genai::GenerationFinishReason& value) {
+    return Napi::Number::New(env, static_cast<int>(value));
+}


cpp_to_js<GenerationFinishReason> currently returns static_cast<int>(value) without validating the enum value or explicitly documenting the numeric mapping. Nearby enums (e.g., StopCriteria) use an explicit switch + throw on unknown values to keep the JS ABI stable. Consider doing the same here so future enum changes don’t silently produce mismatched numbers in JS.

Wovchena · 2026-04-22T10:53:39Z

+
+private:
+    class IncrementalParserImpl;
+    std::unique_ptr<IncrementalParserImpl> m_impl;


Interface class just got a user inaccessible member. Why?

pavel-esir added 3 commits March 20, 2026 15:21

add stop reason when it's invoked by streaming/incremental parser

9b6b23b

add GenerationFinishReason to Decoded and EncodedResults

835bb97

pass stop reason to handle

b46b1b5

pavel-esir requested review from as-suvorov, sgonorov and yatarkan as code owners April 8, 2026 11:57

Copilot AI review requested due to automatic review settings April 8, 2026 11:57

pavel-esir requested review from Wovchena, popovaan and sbalandi as code owners April 8, 2026 11:57

Copilot started reviewing on behalf of pavel-esir April 8, 2026 11:58 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

update pybind

182f8f9

pavel-esir commented Apr 9, 2026

View reviewed changes

Comment thread tests/python_tests/test_parsers.py Outdated

apaniukov reviewed Apr 9, 2026

View reviewed changes

Comment thread tests/python_tests/test_parsers.py Outdated

Default parameter for GenerationHandle is set to STOP; Fixed tests

7ca85bb

Copilot AI review requested due to automatic review settings April 10, 2026 11:35

Copilot started reviewing on behalf of pavel-esir April 10, 2026 11:36 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

apaniukov reviewed Apr 15, 2026

View reviewed changes

Comment thread samples/python/text_generation/compound_grammar_generation.py Outdated

apaniukov reviewed Apr 15, 2026

View reviewed changes

Comment thread tests/python_tests/test_llm_pipeline_static.py Outdated

pavel-esir added 2 commits April 15, 2026 14:50

add fixture

5217bdd

add js api

1afb721

Copilot AI review requested due to automatic review settings April 15, 2026 12:52

pavel-esir requested a review from Retribution98 as a code owner April 15, 2026 12:52

github-actions Bot added the category: JS API GenAI JS API label Apr 15, 2026

Copilot started reviewing on behalf of pavel-esir April 15, 2026 12:53 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread tests/python_tests/test_parsers.py

Comment thread tests/python_tests/test_parsers.py

Comment thread samples/python/text_generation/README.md

pavel-esir force-pushed the add_stop_reason branch from 5d44ac1 to 2ca80b0 Compare April 15, 2026 13:20

adjust JS sample for compound grammar

2ca80b0

Copilot AI review requested due to automatic review settings April 16, 2026 09:49

Copilot started reviewing on behalf of pavel-esir April 16, 2026 09:50 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread samples/python/text_generation/README.md

Comment thread src/cpp/src/text_streamer.cpp Outdated

Comment thread src/js/src/helper.cpp

fix node.js binding

581d894

apaniukov approved these changes Apr 16, 2026

View reviewed changes

fix namein in JS

ec3ebea

Wovchena requested changes Apr 17, 2026

View reviewed changes

Comment thread samples/python/text_generation/compound_grammar_generation.py Outdated

Comment thread tests/python_tests/test_parsers.py

Copilot AI review requested due to automatic review settings April 20, 2026 11:44

Copilot started reviewing on behalf of pavel-esir April 20, 2026 11:45 View session

pavel-esir requested a review from Wovchena April 20, 2026 11:52

Copilot AI reviewed Apr 20, 2026

View reviewed changes

move inside __init__, other corrections

eb5020e

Copilot AI review requested due to automatic review settings April 21, 2026 10:44

pavel-esir force-pushed the add_stop_reason branch from b67c181 to 07cad33 Compare April 21, 2026 10:44

Copilot started reviewing on behalf of pavel-esir April 21, 2026 10:45 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

pavel-esir requested a review from mzegla April 21, 2026 11:08

add test for several stop reasons

07cad33

Wovchena requested changes Apr 22, 2026

View reviewed changes

Conversation

pavel-esir commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

pavel-esir Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Wovchena Apr 22, 2026

Choose a reason for hiding this comment

pavel-esir commented Apr 8, 2026 •

edited

Loading