Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces explicit generation stop/finish reasons across C++ and Python APIs (including a new tool-call stop path) to better align GenAI behavior with OpenAI-compatible API expectations.
Changes:
- Added
GenerationFinishReasonplumbing end-to-end (pipelines populate per-sequencefinish_reasons; Python bindings/stubs expose them). - Added
StreamingStatus::TOOL_CALL_STOPto represent parser-triggered stopping during streaming. - Updated streaming loops to react to
TOOL_CALL_STOPand propagate stop semantics.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/python_tests/test_parsers.py | Adds a new incremental parser test scenario (tool-call extraction + reasoning extraction). |
| src/python/py_streamers.cpp | Exposes StreamingStatus.TOOL_CALL_STOP to Python. |
| src/python/py_openvino_genai.cpp | Exposes finish_reasons on DecodedResults / EncodedResults to Python. |
| src/python/openvino_genai/py_openvino_genai.pyi | Updates Python stubs for new fields/statuses and GenerationHandle.stop signature. |
| src/cpp/src/whisper/whisper.cpp | Updates Whisper streaming stop handling and populates finish_reasons. |
| src/cpp/src/whisper/pipeline_static.cpp | Updates Whisper streaming stop handling and populates finish_reasons. |
| src/cpp/src/visual_language/pipeline.cpp | Propagates finish_reasons from encoded → decoded results. |
| src/cpp/src/visual_language/continuous_batching_adapter.hpp | Propagates finish_reasons through VLM continuous batching adapter. |
| src/cpp/src/text_streamer.cpp | Adds parser-driven tool-call stop handling in TextParserStreamer. |
| src/cpp/src/speculative_decoding/stateful/stateful_pipeline_base.cpp | Propagates finish_reasons into decoded outputs. |
| src/cpp/src/speculative_decoding/stateful/fast_draft_strategy.cpp | Initializes finish_reasons in results. |
| src/cpp/src/speculative_decoding/stateful/eagle3_strategy.cpp | Initializes finish_reasons in results. |
| src/cpp/src/speculative_decoding/continuous_batching/fast_draft_strategy.hpp | Populates per-sequence m_finish_reasons with fallback to stream reason on external stop. |
| src/cpp/src/prompt_lookup/prompt_lookup_impl.cpp | Populates per-sequence m_finish_reasons with fallback to stream reason on external stop. |
| src/cpp/src/lm_encoding.cpp | Propagates TOOL_CALL_STOP into handle->stop(TOOL_CALL) and collects finish_reasons. |
| src/cpp/src/llm/pipeline_static.cpp | Propagates TOOL_CALL_STOP into handle->stop(TOOL_CALL) and collects finish_reasons. |
| src/cpp/src/llm/pipeline_stateful.cpp | Propagates finish_reasons into decoded outputs. |
| src/cpp/src/llm/pipeline_continuous_batching_adapter.hpp | Aggregates/moves finish_reasons through the adapter. |
| src/cpp/src/generation_stream.hpp | Stores a finish reason on GenerationStream::stop(...). |
| src/cpp/src/generation_handle.cpp | Extends GenerationHandleImpl::stop(...) to accept a finish reason. |
| src/cpp/src/continuous_batching/pipeline_impl.cpp | Populates per-sequence m_finish_reasons with fallback to stream reason on external stop. |
| src/cpp/src/continuous_batching/pipeline_base.cpp | Propagates TOOL_CALL_STOP into handle->stop(TOOL_CALL) and propagates finish_reasons through result conversion. |
| src/cpp/include/openvino/genai/streamer_base.hpp | Adds StreamingStatus::TOOL_CALL_STOP to public C++ API. |
| src/cpp/include/openvino/genai/parsers.hpp | Adds IncrementalParser::get_status() to support stop signaling. |
| src/cpp/include/openvino/genai/llm_pipeline.hpp | Adds finish_reasons to EncodedResults / DecodedResults. |
| src/cpp/include/openvino/genai/generation_handle.hpp | Adds GenerationFinishReason::TOOL_CALL and per-sequence finish reason vectors; extends stop(...) API. |
5d44ac1 to
2ca80b0
Compare
| .def_readwrite("m_generation_ids", &EncodedGenerationResult::m_generation_ids) | ||
| .def_readwrite("m_scores", &EncodedGenerationResult::m_scores) | ||
| .def_readonly("finish_reasons", &EncodedGenerationResult::m_finish_reasons) | ||
| .def_readonly("perf_metrics", &EncodedGenerationResult::perf_metrics) |
There was a problem hiding this comment.
EncodedGenerationResult exposes the new finish-reason vector under the python attribute name finish_reasons, while the rest of the struct fields are exposed as m_request_id / m_generation_ids / m_scores. This inconsistency makes the API harder to discover and breaks the naming pattern users rely on for these handle result structs. Consider renaming the binding to m_finish_reasons (and updating the .pyi accordingly), or exposing both names as aliases for backward/forward compatibility.
There was a problem hiding this comment.
This question lies outside of this PR and should be discussed separately. We already in master have incosistency some fields are exposed with m_ prefix some without. I made finish reasons same as perf_metrics, extended_perf_metrics. We should address this separtely.
b67c181 to
07cad33
Compare
| prompts = [ | ||
| "What is the capital of France? Just answer without explanation.", | ||
| "Why the Sun is Yellow", | ||
| ] | ||
| res = pipe.generate(prompts, max_new_tokens=50) | ||
|
|
||
| assert len(res.texts) == len(prompts) | ||
| assert res.finish_reasons == [GenerationFinishReason.STOP, GenerationFinishReason.LENGTH] |
There was a problem hiding this comment.
test_batched_generate_returns_finish_reason_for_each_sequence asserts a specific STOP/LENGTH combination for two prompts, but without controlling EOS/stop conditions this is likely to be non-deterministic across model versions/conversion settings (the second prompt may finish with EOS before hitting max_new_tokens). Make the test deterministic by explicitly configuring stop conditions (e.g., use ignore_eos=True plus a stop string that only the first prompt is expected to emit) or relax the assertion to only validate that finish_reasons has one entry per prompt and values are in the expected set.
| @@ -104,6 +120,7 @@ | |||
| return result_dicts; | |||
| }) | |||
| .def_readonly("perf_metrics", &DecodedResults::perf_metrics) | |||
| .def_readonly("finish_reasons", &DecodedResults::finish_reasons) | |||
| .def_readonly("extended_perf_metrics", &DecodedResults::extended_perf_metrics) | |||
There was a problem hiding this comment.
The DecodedResults pybind docstring still documents only texts/scores/metrics, but the binding now also exposes finish_reasons. Please update decoded_results_docstring so the Python API docs reflect the new field.
| py::class_<EncodedResults>(m, "EncodedResults", encoded_results_docstring) | ||
| .def_readonly("tokens", &EncodedResults::tokens) | ||
| .def_readonly("scores", &EncodedResults::scores) | ||
| .def_readonly("perf_metrics", &EncodedResults::perf_metrics) | ||
| .def_readonly("finish_reasons", &EncodedResults::finish_reasons) | ||
| .def_readonly("extended_perf_metrics", &EncodedResults::extended_perf_metrics); |
There was a problem hiding this comment.
The EncodedResults pybind docstring still documents only tokens/scores/metrics, but the binding now also exposes finish_reasons. Please update encoded_results_docstring so the Python API docs reflect the new field.
| template <> | ||
| Napi::Value cpp_to_js<ov::genai::GenerationFinishReason, Napi::Value>( | ||
| const Napi::Env& env, | ||
| const ov::genai::GenerationFinishReason& value) { | ||
| return Napi::Number::New(env, static_cast<int>(value)); | ||
| } |
There was a problem hiding this comment.
cpp_to_js<GenerationFinishReason> currently returns static_cast<int>(value) without validating the enum value or explicitly documenting the numeric mapping. Nearby enums (e.g., StopCriteria) use an explicit switch + throw on unknown values to keep the JS ABI stable. Consider doing the same here so future enum changes don’t silently produce mismatched numbers in JS.
|
|
||
| private: | ||
| class IncrementalParserImpl; | ||
| std::unique_ptr<IncrementalParserImpl> m_impl; |
There was a problem hiding this comment.
Interface class just got a user inaccessible member. Why?
Description
CVS-181410
Is connected to openvinotoolkit/model_server#3927
Checklist: