Skip to content

Add finish reason#3670

Open
pavel-esir wants to merge 24 commits intoopenvinotoolkit:masterfrom
pavel-esir:add_stop_reason
Open

Add finish reason#3670
pavel-esir wants to merge 24 commits intoopenvinotoolkit:masterfrom
pavel-esir:add_stop_reason

Conversation

@pavel-esir
Copy link
Copy Markdown
Contributor

@pavel-esir pavel-esir commented Apr 8, 2026

Description

  • Adds stop reason to generation results, adds one more reason TOOL_CALL_STOP and and allows to call stop from parser. With this change, GenAI behaves consistently with OpenAI-compatible API expectations.

CVS-181410

Is connected to openvinotoolkit/model_server#3927

Checklist:

  • This PR follows GenAI Contributing guidelines.
  • Tests have been updated or added to cover the new code.
  • This PR fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Copilot AI review requested due to automatic review settings April 8, 2026 11:57
@github-actions github-actions Bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: whisper Whisper pipeline category: speculative decoding Speculative decoding category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers no-match-files category: prompt lookup Prompt look-up decoding category: GGUF GGUF file reader category: text streamer labels Apr 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces explicit generation stop/finish reasons across C++ and Python APIs (including a new tool-call stop path) to better align GenAI behavior with OpenAI-compatible API expectations.

Changes:

  • Added GenerationFinishReason plumbing end-to-end (pipelines populate per-sequence finish_reasons; Python bindings/stubs expose them).
  • Added StreamingStatus::TOOL_CALL_STOP to represent parser-triggered stopping during streaming.
  • Updated streaming loops to react to TOOL_CALL_STOP and propagate stop semantics.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
tests/python_tests/test_parsers.py Adds a new incremental parser test scenario (tool-call extraction + reasoning extraction).
src/python/py_streamers.cpp Exposes StreamingStatus.TOOL_CALL_STOP to Python.
src/python/py_openvino_genai.cpp Exposes finish_reasons on DecodedResults / EncodedResults to Python.
src/python/openvino_genai/py_openvino_genai.pyi Updates Python stubs for new fields/statuses and GenerationHandle.stop signature.
src/cpp/src/whisper/whisper.cpp Updates Whisper streaming stop handling and populates finish_reasons.
src/cpp/src/whisper/pipeline_static.cpp Updates Whisper streaming stop handling and populates finish_reasons.
src/cpp/src/visual_language/pipeline.cpp Propagates finish_reasons from encoded → decoded results.
src/cpp/src/visual_language/continuous_batching_adapter.hpp Propagates finish_reasons through VLM continuous batching adapter.
src/cpp/src/text_streamer.cpp Adds parser-driven tool-call stop handling in TextParserStreamer.
src/cpp/src/speculative_decoding/stateful/stateful_pipeline_base.cpp Propagates finish_reasons into decoded outputs.
src/cpp/src/speculative_decoding/stateful/fast_draft_strategy.cpp Initializes finish_reasons in results.
src/cpp/src/speculative_decoding/stateful/eagle3_strategy.cpp Initializes finish_reasons in results.
src/cpp/src/speculative_decoding/continuous_batching/fast_draft_strategy.hpp Populates per-sequence m_finish_reasons with fallback to stream reason on external stop.
src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp Populates per-sequence m_finish_reasons with fallback to stream reason on external stop.
src/cpp/src/lm_encoding.cpp Propagates TOOL_CALL_STOP into handle->stop(TOOL_CALL) and collects finish_reasons.
src/cpp/src/llm/pipeline_static.cpp Propagates TOOL_CALL_STOP into handle->stop(TOOL_CALL) and collects finish_reasons.
src/cpp/src/llm/pipeline_stateful.cpp Propagates finish_reasons into decoded outputs.
src/cpp/src/llm/pipeline_continuous_batching_adapter.hpp Aggregates/moves finish_reasons through the adapter.
src/cpp/src/generation_stream.hpp Stores a finish reason on GenerationStream::stop(...).
src/cpp/src/generation_handle.cpp Extends GenerationHandleImpl::stop(...) to accept a finish reason.
src/cpp/src/continuous_batching/pipeline_impl.cpp Populates per-sequence m_finish_reasons with fallback to stream reason on external stop.
src/cpp/src/continuous_batching/pipeline_base.cpp Propagates TOOL_CALL_STOP into handle->stop(TOOL_CALL) and propagates finish_reasons through result conversion.
src/cpp/include/openvino/genai/streamer_base.hpp Adds StreamingStatus::TOOL_CALL_STOP to public C++ API.
src/cpp/include/openvino/genai/parsers.hpp Adds IncrementalParser::get_status() to support stop signaling.
src/cpp/include/openvino/genai/llm_pipeline.hpp Adds finish_reasons to EncodedResults / DecodedResults.
src/cpp/include/openvino/genai/generation_handle.hpp Adds GenerationFinishReason::TOOL_CALL and per-sequence finish reason vectors; extends stop(...) API.

Comment thread tests/python_tests/test_parsers.py Outdated
Comment thread tests/python_tests/test_parsers.py
Comment thread tests/python_tests/test_parsers.py Outdated
Comment thread src/cpp/src/text_streamer.cpp Outdated
Comment thread src/cpp/src/text_streamer.cpp Outdated
Comment thread src/cpp/src/whisper/whisper.cpp Outdated
Comment thread src/cpp/src/whisper/pipeline_static.cpp Outdated
Comment thread src/cpp/src/whisper/pipeline_static.cpp
Comment thread src/cpp/src/whisper/whisper.cpp Outdated
Comment thread tests/python_tests/test_parsers.py
Comment thread tests/python_tests/test_parsers.py Outdated
Comment thread tests/python_tests/test_parsers.py Outdated
Copilot AI review requested due to automatic review settings April 10, 2026 11:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Comment thread src/cpp/src/text_streamer.cpp
Comment thread src/cpp/src/text_streamer.cpp Outdated
Comment thread src/cpp/src/generation_stream.hpp
Comment thread src/cpp/include/openvino/genai/generation_handle.hpp Outdated
Comment thread src/python/openvino_genai/py_openvino_genai.pyi Outdated
Comment thread tests/python_tests/test_parsers.py
Comment thread tests/python_tests/test_parsers.py Outdated
Comment thread samples/python/text_generation/compound_grammar_generation.py Outdated
Comment thread tests/python_tests/test_llm_pipeline_static.py Outdated
Copilot AI review requested due to automatic review settings April 15, 2026 12:52
@github-actions github-actions Bot added the category: JS API GenAI JS API label Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

Comment thread tests/python_tests/test_parsers.py
Comment thread tests/python_tests/test_parsers.py
Comment thread samples/python/text_generation/README.md
Copilot AI review requested due to automatic review settings April 16, 2026 09:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

Comment thread samples/python/text_generation/README.md
Comment thread src/cpp/src/text_streamer.cpp Outdated
Comment thread src/js/src/helper.cpp
Comment thread samples/python/text_generation/compound_grammar_generation.py Outdated
Comment thread tests/python_tests/test_parsers.py
Copilot AI review requested due to automatic review settings April 20, 2026 11:44
@pavel-esir pavel-esir requested a review from Wovchena April 20, 2026 11:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

Comment on lines 255 to 258
.def_readwrite("m_generation_ids", &EncodedGenerationResult::m_generation_ids)
.def_readwrite("m_scores", &EncodedGenerationResult::m_scores)
.def_readonly("finish_reasons", &EncodedGenerationResult::m_finish_reasons)
.def_readonly("perf_metrics", &EncodedGenerationResult::perf_metrics)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EncodedGenerationResult exposes the new finish-reason vector under the python attribute name finish_reasons, while the rest of the struct fields are exposed as m_request_id / m_generation_ids / m_scores. This inconsistency makes the API harder to discover and breaks the naming pattern users rely on for these handle result structs. Consider renaming the binding to m_finish_reasons (and updating the .pyi accordingly), or exposing both names as aliases for backward/forward compatibility.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question lies outside of this PR and should be discussed separately. We already in master have incosistency some fields are exposed with m_ prefix some without. I made finish reasons same as perf_metrics, extended_perf_metrics. We should address this separtely.

Copilot AI review requested due to automatic review settings April 21, 2026 10:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 4 comments.

Comment on lines +798 to +805
prompts = [
"What is the capital of France? Just answer without explanation.",
"Why the Sun is Yellow",
]
res = pipe.generate(prompts, max_new_tokens=50)

assert len(res.texts) == len(prompts)
assert res.finish_reasons == [GenerationFinishReason.STOP, GenerationFinishReason.LENGTH]
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_batched_generate_returns_finish_reason_for_each_sequence asserts a specific STOP/LENGTH combination for two prompts, but without controlling EOS/stop conditions this is likely to be non-deterministic across model versions/conversion settings (the second prompt may finish with EOS before hitting max_new_tokens). Make the test deterministic by explicitly configuring stop conditions (e.g., use ignore_eos=True plus a stop string that only the first prompt is expected to emit) or relax the assertion to only validate that finish_reasons has one entry per prompt and values are in the expected set.

Copilot uses AI. Check for mistakes.
Comment on lines 111 to 124
@@ -104,6 +120,7 @@
return result_dicts;
})
.def_readonly("perf_metrics", &DecodedResults::perf_metrics)
.def_readonly("finish_reasons", &DecodedResults::finish_reasons)
.def_readonly("extended_perf_metrics", &DecodedResults::extended_perf_metrics)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DecodedResults pybind docstring still documents only texts/scores/metrics, but the binding now also exposes finish_reasons. Please update decoded_results_docstring so the Python API docs reflect the new field.

Copilot uses AI. Check for mistakes.
Comment on lines 138 to 143
py::class_<EncodedResults>(m, "EncodedResults", encoded_results_docstring)
.def_readonly("tokens", &EncodedResults::tokens)
.def_readonly("scores", &EncodedResults::scores)
.def_readonly("perf_metrics", &EncodedResults::perf_metrics)
.def_readonly("finish_reasons", &EncodedResults::finish_reasons)
.def_readonly("extended_perf_metrics", &EncodedResults::extended_perf_metrics);
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EncodedResults pybind docstring still documents only tokens/scores/metrics, but the binding now also exposes finish_reasons. Please update encoded_results_docstring so the Python API docs reflect the new field.

Copilot uses AI. Check for mistakes.
Comment thread src/js/src/helper.cpp
Comment on lines +1041 to +1046
template <>
Napi::Value cpp_to_js<ov::genai::GenerationFinishReason, Napi::Value>(
const Napi::Env& env,
const ov::genai::GenerationFinishReason& value) {
return Napi::Number::New(env, static_cast<int>(value));
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp_to_js<GenerationFinishReason> currently returns static_cast<int>(value) without validating the enum value or explicitly documenting the numeric mapping. Nearby enums (e.g., StopCriteria) use an explicit switch + throw on unknown values to keep the JS ABI stable. Consider doing the same here so future enum changes don’t silently produce mismatched numbers in JS.

Copilot uses AI. Check for mistakes.
@pavel-esir pavel-esir requested a review from mzegla April 21, 2026 11:08

private:
class IncrementalParserImpl;
std::unique_ptr<IncrementalParserImpl> m_impl;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface class just got a user inaccessible member. Why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: JS API GenAI JS API category: LLM samples GenAI LLM samples category: LLM LLM pipeline (stateful, static) category: prompt lookup Prompt look-up decoding category: Python API Python API for GenAI category: speculative decoding Speculative decoding category: text streamer category: visual language Visual language pipeline category: whisper Whisper pipeline no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants