[llm_bench] Add hooks for transformers v5#3696
[llm_bench] Add hooks for transformers v5#3696sbalandi wants to merge 5 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates llm_bench’s Transformers-generation hooks to cover Transformers v5 while reorganizing existing v4.x hook implementations and extending CI to exercise WWB tests against newer Transformers versions.
Changes:
- Replaced v4.x hook files named
*_v43/v45/v51/v52/v55with version-explicit*_v4_43/.../v4_57modules, and added new v5 hook modules (hook_sample_v5.py,hook_sample_v5_3.py). - Updated llm_bench hook dispatch logic to select hook implementations based on
transformers.__version__. - Updated Linux CI workflow to run WWB tests with Transformers 5.2.0 and 5.5.0.
Reviewed changes
Copilot reviewed 25 out of 27 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v55.py | Removed legacy v4.55 sampling hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v52.py | Removed legacy v4.52 sampling hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v51.py | Removed legacy v4.51 sampling hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v45.py | Removed legacy v4.45 sampling hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v43.py | Removed legacy v4.43 sampling hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v4_57.py | Added v4.57 sampling hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v4_55.py | Added v4.55 sampling hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v4_52.py | Added v4.52 sampling hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v4_51.py | Added v4.51 sampling hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v4_45.py | Added v4.45 sampling hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v4_43.py | Added v4.43 sampling hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v5.py | Added Transformers v5.0–v5.2 sampling hook module. |
| tools/llm_bench/llm_bench_utils/llm_hook_sample/hook_sample_v5_3.py | Added Transformers v5.3+ sampling hook module. |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v55.py | Removed legacy v4.55 beam-search hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v52.py | Removed legacy v4.52 beam-search hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v51.py | Removed legacy v4.51 beam-search hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v40.py | Removed legacy v4.40 beam-search hook module (old naming). |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v4_57.py | Added v4.57 beam-search hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v4_55.py | Added v4.55 beam-search hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v4_52.py | Added v4.52 beam-search hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v4_51.py | Added v4.51 beam-search hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/llm_hook_beam_search/hook_beam_search_v4_40.py | Added v4.40 beam-search hook module under version-explicit naming. |
| tools/llm_bench/llm_bench_utils/hook_greedy_search.py | Updated version-based hook selection to include Transformers v5 hooks. |
| tools/llm_bench/llm_bench_utils/hook_beam_search.py | Updated version-based beam-search hook selection to include Transformers v5 hooks. |
| .github/workflows/linux.yml | Added WWB CI runs with Transformers 5.2.0 and 5.5.0. |
|
|
||
| type(model)._sample = hook_sample_v57.new_sample | ||
| type(model)._sample = hook_sample_v5_3.new_sample | ||
| if trans_version >= version.parse("5.0"): |
There was a problem hiding this comment.
Both if trans_version >= 5.3.0 and if trans_version >= 5.0 will run for Transformers >=5.3, so type(model)._sample gets overwritten and the v5.3+ hook is effectively never used. Change the second condition to elif (or otherwise ensure mutually exclusive branching) so Transformers >=5.3 uses hook_sample_v5_3.new_sample as intended.
| if trans_version >= version.parse("5.0"): | |
| elif trans_version >= version.parse("5.0"): |
| elif trans_version >= version.parse("4.57.0"): | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_57 as hook_sample_v4_57 | ||
|
|
||
| type(model)._sample = hook_sample_v4_57.new_sample | ||
| elif trans_version >= version.parse("4.55.0"): |
There was a problem hiding this comment.
These branches import hook modules via tools.llm_bench..., while the rest of llm_bench imports use the llm_bench_utils... package (and the README examples run python benchmark.py from tools/llm_bench). Using the tools. prefix can raise ModuleNotFoundError depending on the working directory/PYTHONPATH. Please align these imports with the existing llm_bench_utils.llm_hook_sample... style for consistent runtime behavior.
| if version.parse(transformers.__version__) >= version.parse("5.3.0"): | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v5_3 as hook_beam_search_v5_3 | ||
|
|
||
| new_beam_search = hook_beam_search_v57.new_beam_search_v57 | ||
| new_beam_search = hook_beam_search_v5_3.new_beam_search |
There was a problem hiding this comment.
hook_beam_search.py now imports version-specific implementations via tools.llm_bench.... Most of llm_bench is imported/run as the llm_bench_utils... package (e.g., from tools/llm_bench/benchmark.py), so this can break execution when the repo root isn’t on PYTHONPATH. Prefer the existing llm_bench_utils.llm_hook_beam_search... import style to keep the hook usable in the standard invocation flow.
| while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): | ||
| if prefill_consumed: | ||
| model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) |
There was a problem hiding this comment.
This Transformers v5 sampling hook is described as collecting latency, but it never records per-iteration total time or forward/inference time into hook_greedy.tm_list / hook_greedy.tm_infer_list (unlike the v4.x hooks). As a result, llm_bench latency metrics will be empty/incorrect for Transformers v5. Please add timing collection around the forward call and the loop body, consistent with the existing v4.x hook implementations.
| model_inputs = self.prepare_inputs_for_generation( | ||
| input_ids, next_sequence_length=next_sequence_length, **model_kwargs | ||
| ) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) |
There was a problem hiding this comment.
This Transformers v5.3+ sampling hook doesn’t append any timings to hook_greedy.tm_list / hook_greedy.tm_infer_list, even though the hook’s purpose is latency collection. That will produce empty/incorrect benchmark latency outputs for Transformers v5.3+. Please add timing collection around the decode forward call and the full iteration, consistent with the v4.x hooks.
| if self.config._attn_implementation == "flash_attention_2" and getattr( | ||
| model_kwargs.get("past_key_values"), "is_compileable", False | ||
| ): | ||
| if generation_config.compile_config is None: | ||
| generation_config.compile_config = CompileConfig(fullgraph=False) |
There was a problem hiding this comment.
CompileConfig is referenced when generation_config.compile_config is None, but it is not imported in this module. If auto-compile is enabled, this will raise a NameError. Import CompileConfig from the appropriate Transformers module (matching the upstream version you copied from) or avoid constructing it here.
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: | | ||
| python -m pip install transformers==5.2.0 | ||
| python -m pip show transformers | ||
| python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" |
There was a problem hiding this comment.
There are now two matrix entries with the identical name WWB tests, which will show up as duplicate job names in CI and make failures harder to triage. Consider renaming them to include the Transformers version (e.g., WWB tests (transformers 5.2.0) / WWB tests (transformers 5.5.0)).
| @@ -618,7 +618,17 @@ jobs: | |||
| run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).RAG.test }} | |||
| timeout: 30 | |||
| - name: 'WWB tests' | |||
There was a problem hiding this comment.
PR description still contains placeholders (e.g., CVS-###, Fixes #(issue)) and the checklist is not filled out. Please update the PR description to match the repository PR template/checklist before merging so reviewers can validate scope, tests, and docs updates.
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_57 as hook_sample_v4_57 | ||
|
|
||
| type(model)._sample = hook_sample_v4_57.new_sample | ||
| elif trans_version >= version.parse("4.55.0"): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v55 as hook_sample_v55 | ||
| model._sample = hook_sample_v55.new_sample.__get__(model, model.__class__) | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_55 as hook_sample_v4_55 | ||
|
|
||
| model._sample = hook_sample_v4_55.new_sample.__get__(model, model.__class__) | ||
| elif trans_version >= version.parse('4.52.0'): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v52 as hook_sample_v52 | ||
| model._sample = hook_sample_v52.new_sample.__get__(model, model.__class__) | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_52 as hook_sample_v4_52 | ||
|
|
||
| model._sample = hook_sample_v4_52.new_sample.__get__(model, model.__class__) | ||
| elif trans_version >= version.parse('4.51.0'): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v51 as hook_sample_v51 | ||
| model._sample = hook_sample_v51.new_sample.__get__(model, model.__class__) | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_51 as hook_sample_v4_51 | ||
|
|
||
| model._sample = hook_sample_v4_51.new_sample.__get__(model, model.__class__) | ||
| elif trans_version >= version.parse('4.45.0'): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v45 as hook_sample_v45 | ||
| model._sample = hook_sample_v45.new_sample.__get__(model, model.__class__) | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_45 as hook_sample_v4_45 | ||
|
|
||
| model._sample = hook_sample_v4_45.new_sample.__get__(model, model.__class__) | ||
| elif trans_version >= version.parse('4.43.0'): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v43 as hook_sample_v43 | ||
| model._sample = hook_sample_v43.new_sample.__get__(model, model.__class__) | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_43 as hook_sample_v4_43 | ||
|
|
||
| model._sample = hook_sample_v4_43.new_sample.__get__(model, model.__class__) |
There was a problem hiding this comment.
These imports use the fully-qualified module path starting with tools.llm_bench..., but tools/llm_bench/README.md instructs running the tool as python benchmark.py from inside tools/llm_bench/. In that execution mode, the repo root (and therefore the tools package) is not on sys.path, so these imports will fail. Prefer importing from llm_bench_utils... consistently (as is done elsewhere in this file) or explicitly ensure the repo root is on PYTHONPATH before using tools.* imports.
| transformers[sentencepiece]>=4.35.2,<5.4.0 | ||
| sentence-transformers>=2.2.2,<=5.3.0 |
There was a problem hiding this comment.
This file still pins transformers[sentencepiece] to <5.4.0, but CI is now explicitly testing Transformers 5.5 for WWB. Please align the declared dependency range with the versions you intend to support/test (either relax the upper bound or drop the 5.5 CI run).
| while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): | ||
| if prefill_consumed: | ||
| next_sequence_length = 1 if model_kwargs["use_cache"] else None | ||
| model_inputs = self.prepare_inputs_for_generation( | ||
| input_ids, next_sequence_length=next_sequence_length, **model_kwargs | ||
| ) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) | ||
| prefill_consumed = True | ||
| model_kwargs = self._update_model_kwargs_for_generation( | ||
| outputs, | ||
| model_kwargs, | ||
| is_encoder_decoder=self.config.is_encoder_decoder, | ||
| ) | ||
| if synced_gpus and this_peer_finished: | ||
| continue | ||
|
|
||
| # Copy is needed to avoid keeping a hanging ref to outputs.logits which may be very large for first iteration | ||
| # (the clone itself is always small) | ||
| next_token_logits = outputs.logits[:, -1, :].to(copy=True, dtype=torch.float32, device=input_ids.device) |
There was a problem hiding this comment.
Same as hook_sample_v5: this v5.3+ hook imports time and llm_bench_utils.hook_greedy_search but does not record any timings. If llm_bench relies on _sample timing for do_sample runs, please add timing collection (total step + forward-pass time) consistent with the v4.x hook implementations.
| if trans_version >= version.parse("5.3.0"): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v5_3 as hook_sample_v5_3 | ||
|
|
||
| type(model)._sample = hook_sample_v57.new_sample | ||
| type(model)._sample = hook_sample_v5_3.new_sample | ||
| if trans_version >= version.parse("5.0"): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v5 as hook_sample_v5 | ||
|
|
||
| type(model)._sample = hook_sample_v5.new_sample | ||
| elif trans_version >= version.parse("4.57.0"): |
There was a problem hiding this comment.
For Transformers >= 5.3.0 this code sets _sample twice because the second condition is an if, not an elif. As written, 5.3+ will first bind hook_sample_v5_3.new_sample and then immediately overwrite it with hook_sample_v5.new_sample, so the v5.3-specific hook is never used. Convert this to a single if/elif/... chain (or match) so only one implementation is selected per version.
| if version.parse(transformers.__version__) >= version.parse("5.3.0"): | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v5_3 as hook_beam_search_v5_3 | ||
|
|
||
| new_beam_search = hook_beam_search_v57.new_beam_search_v57 | ||
| new_beam_search = hook_beam_search_v5_3.new_beam_search | ||
| elif version.parse(transformers.__version__) >= version.parse("5.0"): | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v5 as hook_beam_search_v5 | ||
|
|
||
| new_beam_search = hook_beam_search_v5.new_beam_search | ||
| elif version.parse(transformers.__version__) >= version.parse("4.57.0"): | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v4_57 as hook_beam_search_v4_57 | ||
|
|
||
| new_beam_search = hook_beam_search_v4_57.new_beam_search_v57 | ||
| elif version.parse(transformers.__version__) >= version.parse("4.55.0"): | ||
| import llm_bench_utils.llm_hook_beam_search.hook_beam_search_v55 as hook_beam_search_v55 | ||
| new_beam_search = hook_beam_search_v55.new_beam_search_v55 | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v4_55 as hook_beam_search_v4_55 | ||
|
|
||
| new_beam_search = hook_beam_search_v4_55.new_beam_search_v55 | ||
| elif version.parse(transformers.__version__) >= version.parse("4.52.0"): | ||
| import llm_bench_utils.llm_hook_beam_search.hook_beam_search_v52 as hook_beam_search_v52 | ||
| new_beam_search = hook_beam_search_v52.new_beam_search_v52 | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v4_52 as hook_beam_search_v4_52 | ||
|
|
||
| new_beam_search = hook_beam_search_v4_52.new_beam_search_v52 | ||
| elif version.parse(transformers.__version__) >= version.parse("4.51.0"): | ||
| import llm_bench_utils.llm_hook_beam_search.hook_beam_search_v51 as hook_beam_search_v51 | ||
| new_beam_search = hook_beam_search_v51.new_beam_search_v51 | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v4_51 as hook_beam_search_v4_51 | ||
|
|
||
| new_beam_search = hook_beam_search_v4_51.new_beam_search_v51 | ||
| else: | ||
| import llm_bench_utils.llm_hook_beam_search.hook_beam_search_v40 as hook_beam_search_v40 | ||
| new_beam_search = hook_beam_search_v40.new_beam_search_v40 | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v4_40 as hook_beam_search_v4_40 | ||
|
|
||
| new_beam_search = hook_beam_search_v4_40.new_beam_search_v40 |
There was a problem hiding this comment.
All of these imports use tools.llm_bench... module paths. However, tools/llm_bench/README.md documents running the benchmark as python benchmark.py from within tools/llm_bench/, which does not put the repo root on sys.path. In that common invocation, import tools... will raise ModuleNotFoundError. Use llm_bench_utils.llm_hook_beam_search... imports (consistent with the rest of llm_bench) or adjust the entrypoint so the repo root is on PYTHONPATH.
| @@ -10,7 +10,8 @@ torch>=2.1.0,<=2.11 | |||
| transformers[sentencepiece]>=4.40.0,<5.4.0 | |||
There was a problem hiding this comment.
This file still constrains transformers[sentencepiece] to <5.4.0, but the PR adds hooks/CI for Transformers v5.5 (see workflow changes) and the new hook module comments claim support beyond 5.4. If the intent is to support 5.5, update the version constraint accordingly; otherwise, CI should not test versions outside the declared range.
| transformers[sentencepiece]>=4.40.0,<5.4.0 | |
| transformers[sentencepiece]>=4.40.0,<5.6.0 |
| while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): | ||
| if prefill_consumed: | ||
| model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) | ||
| prefill_consumed = True | ||
| model_kwargs = self._update_model_kwargs_for_generation( | ||
| outputs, | ||
| model_kwargs, |
There was a problem hiding this comment.
These new _sample hooks claim to add latency collection, but there is no timing instrumentation (no time.perf_counter() usage and no writes to hook_greedy.tm_list/tm_infer_list). This looks like a regression compared to the previous v4.x hook implementations where per-step and infer timings were recorded. If sampling latency is still needed for llm_bench metrics, add the same timing updates inside the generation loop and around the model forward call.
| --extra-index-url https://download.pytorch.org/whl/cpu | ||
| diffusers==0.37.1 | ||
| https://github.com/huggingface/optimum-intel/archive/2c48d6430c265ac259c1b264f3e2c4025cdd7b76.tar.gz#egg=optimum-intel | ||
| optimum-intel @ https://github.com/huggingface/optimum-intel.git@transformers-v5 |
There was a problem hiding this comment.
Using an unpinned VCS dependency (optimum-intel @ ...@transformers-v5) makes CI/test installs non-reproducible and is a supply-chain risk because the branch can change over time. Prefer pinning to an immutable commit SHA or a released tag/version.
| optimum-intel @ https://github.com/huggingface/optimum-intel.git@transformers-v5 | |
| optimum-intel==1.25.2 |
| # https://github.com/huggingface/optimum-intel/archive/2c48d6430c265ac259c1b264f3e2c4025cdd7b76.tar.gz#egg=optimum-intel | ||
| optimum-intel @ https://github.com/huggingface/optimum-intel.git@transformers-v5 |
There was a problem hiding this comment.
Using an unpinned VCS dependency (optimum-intel @ ...@transformers-v5) makes installs non-reproducible and is a supply-chain risk because the branch can change over time. Prefer pinning to an immutable commit SHA or a released tag/version.
| # https://github.com/huggingface/optimum-intel/archive/2c48d6430c265ac259c1b264f3e2c4025cdd7b76.tar.gz#egg=optimum-intel | |
| optimum-intel @ https://github.com/huggingface/optimum-intel.git@transformers-v5 | |
| optimum-intel @ https://github.com/huggingface/optimum-intel/archive/2c48d6430c265ac259c1b264f3e2c4025cdd7b76.tar.gz#egg=optimum-intel |
| # optimum-intel[nncf]>=1.19.0,<=1.27.0 | ||
| optimum-intel @ https://github.com/huggingface/optimum-intel.git@transformers-v5 |
There was a problem hiding this comment.
Using an unpinned VCS dependency (optimum-intel @ ...@transformers-v5) makes installs non-reproducible and is a supply-chain risk because the branch can change over time. Prefer pinning to an immutable commit SHA or a released tag/version (as is already done elsewhere in this workflow for optimum-intel).
| - name: 'WWB tests' | ||
| cmd: 'python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava"' | ||
| cmd: | | ||
| python -m pip install transformers==5.2.0 git+https://github.com/huggingface/optimum-intel.git@transformers-v5 | ||
| python -m pip show transformers | ||
| python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" | ||
| run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }} | ||
| timeout: 120 | ||
| - name: 'WWB tests' | ||
| cmd: | | ||
| python -m pip install transformers==5.5.0 git+https://github.com/huggingface/optimum-intel.git@transformers-v5 | ||
| python -m pip show transformers | ||
| python -m pytest -v ./tools/who_what_benchmark/tests -m "not nanollava" | ||
| run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }} |
There was a problem hiding this comment.
The matrix now has two entries both named 'WWB tests', which makes CI logs ambiguous. Consider naming them distinctly (e.g., include the transformers version). Also, this installs transformers==5.5.0 while the WWB requirements file currently constrains transformers to <5.4.0; please align the declared constraints with what CI is testing (or vice versa) to avoid confusion about supported versions.
| # Transformers version: v5.3.0, v5.4.0, v5.5.0, v5.5.1, v5.5.2, v5.5.3 | ||
| # https://github.com/huggingface/transformers/blob/v5.3.0/src/transformers/generation/utils.py#L3068 | ||
| # Add the function of collecting latency | ||
| def new_sample( | ||
| self, | ||
| input_ids: torch.LongTensor, | ||
| logits_processor: LogitsProcessorList, | ||
| stopping_criteria: StoppingCriteriaList, | ||
| generation_config: GenerationConfig, | ||
| synced_gpus: bool = False, | ||
| streamer: Optional["BaseStreamer"] = None, | ||
| **model_kwargs, | ||
| ) -> GenerateNonBeamOutput | torch.LongTensor: | ||
| r""" | ||
| Generates sequences of token ids for models with a language modeling head using **multinomial sampling** and | ||
| can be used for text-decoder, text-to-text, speech-to-text, and vision-to-text models. | ||
|
|
||
| Parameters: | ||
| input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): | ||
| The sequence used as a prompt for the generation. | ||
| logits_processor (`LogitsProcessorList`): | ||
| An instance of [`LogitsProcessorList`]. List of instances of class derived from [`LogitsProcessor`] | ||
| used to modify the prediction scores of the language modeling head applied at each generation step. | ||
| stopping_criteria (`StoppingCriteriaList`): | ||
| An instance of [`StoppingCriteriaList`]. List of instances of class derived from [`StoppingCriteria`] | ||
| used to tell if the generation loop should stop. | ||
| generation_config ([`~generation.GenerationConfig`]): | ||
| The generation configuration to be used as parametrization of the decoding method. | ||
| synced_gpus (`bool`): | ||
| Whether to continue running the while loop until max_length (needed to avoid deadlocking with | ||
| `FullyShardedDataParallel` and DeepSpeed ZeRO Stage 3). | ||
| streamer (`BaseStreamer`, *optional*): | ||
| Streamer object that will be used to stream the generated sequences. Generated tokens are passed | ||
| through `streamer.put(token_ids)` and the streamer is responsible for any further processing. | ||
| model_kwargs: | ||
| Additional model specific kwargs will be forwarded to the `forward` function of the model. If model is | ||
| an encoder-decoder model the kwargs should include `encoder_outputs`. | ||
|
|
||
| Return: | ||
| [`~generation.GenerateDecoderOnlyOutput`], [`~generation.GenerateEncoderDecoderOutput`] or `torch.LongTensor`: | ||
| A `torch.LongTensor` containing the generated tokens (default behaviour) or a | ||
| [`~generation.GenerateDecoderOnlyOutput`] if `model.config.is_encoder_decoder=False` and | ||
| `return_dict_in_generate=True` or a [`~generation.GenerateEncoderDecoderOutput`] if | ||
| `model.config.is_encoder_decoder=True`. | ||
| """ | ||
| # init values | ||
| pad_token_id = generation_config._pad_token_tensor | ||
| output_attentions = generation_config.output_attentions | ||
| output_hidden_states = generation_config.output_hidden_states | ||
| output_scores = generation_config.output_scores | ||
| output_logits = generation_config.output_logits | ||
| return_dict_in_generate = generation_config.return_dict_in_generate | ||
| has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria) | ||
| do_sample = generation_config.do_sample | ||
|
|
||
| # init attention / hidden states / scores tuples | ||
| scores = () if (return_dict_in_generate and output_scores) else None | ||
| raw_logits = () if (return_dict_in_generate and output_logits) else None | ||
| decoder_attentions = () if (return_dict_in_generate and output_attentions) else None | ||
| cross_attentions = () if (return_dict_in_generate and output_attentions) else None | ||
| decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None | ||
|
|
||
| # if model is an encoder-decoder, retrieve encoder attention weights and hidden states | ||
| if return_dict_in_generate and self.config.is_encoder_decoder: | ||
| encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None | ||
| encoder_hidden_states = model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None | ||
|
|
||
| # keep track of which sequences are already finished | ||
| batch_size = input_ids.shape[0] | ||
| this_peer_finished = False | ||
| unfinished_sequences = torch.ones(batch_size, dtype=torch.long, device=input_ids.device) | ||
|
|
||
| model_forward = ( | ||
| self.get_compiled_call(generation_config.compile_config) | ||
| if self._valid_auto_compile_criteria(model_kwargs, generation_config) | ||
| else self.__call__ | ||
| ) | ||
|
|
||
| prefill_consumed = False | ||
| outputs = self._prefill( | ||
| input_ids, | ||
| generation_config, | ||
| model_kwargs, | ||
| is_first_iteration=not generation_config.is_assistant, | ||
| ) | ||
|
|
||
| while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): | ||
| if prefill_consumed: | ||
| next_sequence_length = 1 if model_kwargs["use_cache"] else None | ||
| model_inputs = self.prepare_inputs_for_generation( | ||
| input_ids, next_sequence_length=next_sequence_length, **model_kwargs | ||
| ) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) | ||
| prefill_consumed = True |
There was a problem hiding this comment.
This hook is documented as adding latency collection, but unlike the existing v4.x hooks it never records per-iteration timing (e.g., updating hook_greedy.tm_infer_list / hook_greedy.tm_list). As a result, llm_bench will not report token/inference latency for transformers v5.3+ paths. Please add the same timing instrumentation used in hook_sample_v4_57.py around the model forward pass and the full loop iteration.
| # optimum is in dependency list of optimum-intel | ||
| optimum-intel[nncf]>=1.25.0,<=1.27.0 | ||
| # optimum-intel[nncf]>=1.25.0,<=1.27.0 | ||
| optimum-intel @ https://github.com/huggingface/optimum-intel.git@transformers-v5 |
There was a problem hiding this comment.
Using an unpinned VCS dependency (optimum-intel @ ...@transformers-v5) makes installs non-reproducible and is a supply-chain risk because the branch can change over time. Prefer pinning to an immutable commit SHA or a released tag/version.
| optimum-intel @ https://github.com/huggingface/optimum-intel.git@transformers-v5 | |
| optimum-intel==1.27.0 |
| model_forward = ( | ||
| self.get_compiled_call(generation_config.compile_config) | ||
| if self._valid_auto_compile_criteria(model_kwargs, generation_config) | ||
| else self.__call__ | ||
| ) | ||
|
|
||
| prefill_consumed = False | ||
| outputs = self._prefill( | ||
| input_ids, | ||
| generation_config, | ||
| model_kwargs, | ||
| is_first_iteration=not generation_config.is_assistant, | ||
| ) | ||
|
|
||
| while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): | ||
| if prefill_consumed: | ||
| next_sequence_length = 1 if model_kwargs["use_cache"] else None | ||
| model_inputs = self.prepare_inputs_for_generation( | ||
| input_ids, next_sequence_length=next_sequence_length, **model_kwargs | ||
| ) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) | ||
| prefill_consumed = True | ||
| model_kwargs = self._update_model_kwargs_for_generation( | ||
| outputs, | ||
| model_kwargs, | ||
| is_encoder_decoder=self.config.is_encoder_decoder, | ||
| ) | ||
| if synced_gpus and this_peer_finished: | ||
| continue |
There was a problem hiding this comment.
The v5.3+ sampling hook is supposed to collect per-step latency (as done in v4.x hooks), but this implementation never appends to hook_greedy.tm_infer_list / hook_greedy.tm_list and therefore breaks latency reporting for do_sample in llm_bench. Add the same timing instrumentation around the decode forward pass and per-iteration loop used in the v4.x hook implementations.
| - name: Install dependencies | ||
| uses: ./src/.github/actions/install_wheel | ||
| with: | ||
| packages: openvino;openvino_tokenizers[transformers] | ||
| requirements_files: "${{ env.SRC_DIR }}/tests/python_tests/requirements.txt" | ||
| local_wheel_dir: ${{ env.INSTALL_DIR }}/wheels | ||
| run: | | ||
| python -m pip install -r ${{ env.SRC_DIR }}/tests/python_tests/requirements.txt | ||
| python -m pip install openvino openvino_tokenizers[transformers] --extra-index-url=https://storage.openvinotoolkit.org/simple/wheels/nightly |
There was a problem hiding this comment.
This step defines both uses: and run: in the same step, which is invalid GitHub Actions YAML and will fail workflow parsing. Convert this step to either a composite action step (uses + with) or a shell step (run) but not both.
| timeout: 60 | ||
| - name: 'VLM (qwen3-vl)' | ||
| cmd: | | ||
| python -m pip install transformers==4.57.0 |
There was a problem hiding this comment.
The qwen3-vl job installs transformers==4.57.0 twice (first alone, then again with the optimum-intel git dependency). Drop the redundant install to reduce time and avoid potential resolver churn.
| python -m pip install transformers==4.57.0 |
| @pytest.mark.transformers_lower_v5 | ||
| @pytest.mark.parametrize("model_descr", get_whisper_models_list(tiny_only=True)) | ||
| @pytest.mark.parametrize("sample_from_dataset", [{"language": "en", "sample_id": 1}], indirect=True) | ||
| @pytest.mark.xfail(condition=(sys.platform == "darwin"), reason="Ticket - 173169") |
There was a problem hiding this comment.
@pytest.mark.transformers_lower_v5 is newly introduced but is not registered in tests/python_tests/pytest.ini markers list. Add it there (or reuse an existing marker) to avoid unknown-marker warnings/errors in CI (especially if --strict-markers is enabled).
| if hf_generation_config.num_return_sequences is None: | ||
| num_return_sequences = 1 | ||
|
|
||
| for idx, hf_encoded_out in enumerate(hf_encoded_outputs.sequences): | ||
| prompt_idx = idx // hf_generation_config.num_return_sequences | ||
| prompt_idx = idx // num_return_sequences | ||
| prompt_len = 0 if generation_configs.echo else input_ids[prompt_idx].numel() |
There was a problem hiding this comment.
num_return_sequences is only set when hf_generation_config.num_return_sequences is None, but it’s used unconditionally in the loop below. When num_return_sequences is not None, this will raise UnboundLocalError. Initialize it to hf_generation_config.num_return_sequences or 1 before the loop.
| if trans_version >= version.parse("5.3.0"): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v5_3 as hook_sample_v5_3 | ||
|
|
||
| type(model)._sample = hook_sample_v57.new_sample | ||
| type(model)._sample = hook_sample_v5_3.new_sample | ||
| if trans_version >= version.parse("5.0"): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v5 as hook_sample_v5 | ||
|
|
||
| type(model)._sample = hook_sample_v5.new_sample | ||
| elif trans_version >= version.parse("4.57.0"): |
There was a problem hiding this comment.
The version checks are not mutually exclusive: for transformers >= 5.3, the first if sets _sample to v5_3, but the following if trans_version >= 5.0 then overwrites it with v5. This makes the v5.3 hook unreachable. Change the second if to elif (or otherwise ensure a single branch runs).
| if is_transformers_version("<", "5.0"): | ||
| processor.audio_tokenizer = None | ||
| else: | ||
| processor.pop("audio_tokenizer", None) |
There was a problem hiding this comment.
processor is a transformers ProcessorMixin instance, not a dict, so processor.pop("audio_tokenizer", None) will raise AttributeError. If you need to remove this field for transformers>=5, use delattr(processor, "audio_tokenizer") (guarded with hasattr) or set it to None consistently.
| processor.pop("audio_tokenizer", None) | |
| if hasattr(processor, "audio_tokenizer"): | |
| delattr(processor, "audio_tokenizer") |
| if Version(transformers_version) < "5.0.0": | ||
| VISUAL_VIDEO_TEXT_MODELS += (("optimum-intel-internal-testing/tiny-random-llava-next-video", "visual-video-text"),) | ||
|
|
There was a problem hiding this comment.
Version(transformers_version) < "5.0.0" compares a Version object to a string, which raises TypeError in packaging. Compare against Version("5.0.0") (or use packaging.version.parse) so the conditional works at runtime.
| def get_hf_cli_command() -> str: | ||
| # huggingface_hub < 1.0 supports huggingface-cli, huggingface_hub >= 1.0 use hf | ||
| version = importlib.metadata.version("huggingface_hub") | ||
| major = int(version.split(".", 1)[0]) | ||
| if major >= 1: | ||
| return "hf" | ||
| return "huggingface-cli" |
There was a problem hiding this comment.
importlib.metadata isn’t imported here (import importlib alone doesn’t load the metadata submodule), so importlib.metadata.version(...) will raise AttributeError. Use the already-imported metadata (metadata.version(...)) or import importlib.metadata / from importlib import metadata.
| def get_hf_cli_command() -> str: | ||
| # huggingface_hub < 1.0 supports huggingface-cli, huggingface_hub >= 1.0 use hf | ||
| version = importlib.metadata.version("huggingface_hub") | ||
| major = int(version.split(".", 1)[0]) | ||
| if major >= 1: | ||
| return "hf" | ||
| return "huggingface-cli" |
There was a problem hiding this comment.
importlib.metadata is not imported (only import importlib), so importlib.metadata.version(...) will fail at runtime. Import importlib.metadata / from importlib import metadata, or use metadata.version(...).
| if ( | ||
| is_transformers_version(">=", "5.0") | ||
| and generation_config is not None | ||
| and generation_config.get("num_beam_groups", 1) > 1 |
There was a problem hiding this comment.
extra_generate_kwargs() treats generation_config as a dict (generation_config.get(...)), but current call sites pass a transformers.GenerationConfig instance. This will raise AttributeError. Either accept HFGenerationConfig and use getattr(generation_config, "num_beam_groups", 1), or pass generation_config.to_dict() from the caller.
| if ( | |
| is_transformers_version(">=", "5.0") | |
| and generation_config is not None | |
| and generation_config.get("num_beam_groups", 1) > 1 | |
| num_beam_groups = 1 | |
| if generation_config is not None: | |
| if isinstance(generation_config, dict): | |
| num_beam_groups = generation_config.get("num_beam_groups", 1) | |
| else: | |
| num_beam_groups = getattr(generation_config, "num_beam_groups", 1) | |
| if ( | |
| is_transformers_version(">=", "5.0") | |
| and generation_config is not None | |
| and num_beam_groups > 1 |
| generate_outputs = opt_model.generate( | ||
| input_ids=input_ids, | ||
| attention_mask=attention_mask, | ||
| generation_config=hf_generation_config, | ||
| tokenizer=hf_tokenizer, | ||
| **extra_generate_kwargs(), | ||
| **extra_generate_kwargs(hf_generation_config), | ||
| ) |
There was a problem hiding this comment.
extra_generate_kwargs(hf_generation_config) passes a transformers.GenerationConfig object, but extra_generate_kwargs currently expects a dict-like object (uses .get). This will raise at runtime; either pass hf_generation_config.to_dict() here or change extra_generate_kwargs to handle GenerationConfig.
| - name: Install GenAI Wheels | ||
| if: ${{ matrix.test.run_condition }} | ||
| uses: ./src/.github/actions/install_wheel | ||
| with: | ||
| packages: "openvino;openvino_tokenizers[transformers];openvino_genai;whowhatbench" | ||
| requirements_files: "${{ env.SRC_DIR }}/tests/python_tests/requirements.txt" | ||
| local_wheel_dir: ${{ env.INSTALL_DIR }}/wheels | ||
|
|
||
| run: | | ||
| python -m pip install ${{ env.INSTALL_DIR }}/wheels/whowhatbench-*.whl --extra-index-url https://download.pytorch.org/whl/cpu | ||
| python -m pip install -r ${{ env.SRC_DIR }}/tests/python_tests/requirements.txt | ||
| python -m pip install openvino openvino_tokenizers[transformers] openvino_genai[testing] --extra-index-url=https://storage.openvinotoolkit.org/simple/wheels/nightly |
There was a problem hiding this comment.
PR description still contains placeholders (CVS-###, Fixes #(issue)) and the checklist is entirely unchecked. Please update the description to match the PR template (real ticket/issue links, completed checklist, and a brief summary of test coverage for the transformers v5 hooks).
| assert False | ||
| llm_pipe = openvino_genai.VLMPipeline(model_path, device.upper(), **ov_config) |
There was a problem hiding this comment.
assert False will unconditionally abort VLM pipeline creation, so llm_bench cannot benchmark GenAI VLMs. Please remove this debug assertion (or gate it behind an explicit debug flag) so VLMPipeline(...) can be constructed.
| if trans_version >= version.parse("5.3.0"): | ||
| import llm_bench_utils.llm_hook_sample.hook_sample_v5_3 as hook_sample_v5_3 | ||
|
|
||
| type(model)._sample = hook_sample_v57.new_sample | ||
| type(model)._sample = hook_sample_v5_3.new_sample | ||
| if trans_version >= version.parse("5.0"): |
There was a problem hiding this comment.
The version branching uses two independent if statements for >=5.3.0 and >=5.0, so on transformers 5.3+ the 5.3 hook is immediately overwritten by the 5.0 hook. Change the second check to elif (or reorder the conditions) so the most specific hook stays in effect.
| elif trans_version >= version.parse("4.57.0"): | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_sample.hook_sample_v4_57 as hook_sample_v4_57 | ||
|
|
||
| type(model)._sample = hook_sample_v4_57.new_sample |
There was a problem hiding this comment.
These imports use tools.llm_bench..., but tools/llm_bench is not a Python package (no tools/llm_bench/__init__.py), so this branch will raise ModuleNotFoundError at runtime. Use the same import style as the v5 branches (e.g., llm_bench_utils....) or make tools.llm_bench importable by adding __init__.py files.
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v5_3 as hook_beam_search_v5_3 | ||
|
|
||
| new_beam_search = hook_beam_search_v57.new_beam_search_v57 | ||
| new_beam_search = hook_beam_search_v5_3.new_beam_search | ||
| elif version.parse(transformers.__version__) >= version.parse("5.0"): | ||
| import tools.llm_bench.llm_bench_utils.llm_hook_beam_search.hook_beam_search_v5 as hook_beam_search_v5 |
There was a problem hiding this comment.
All these imports reference tools.llm_bench..., but tools/llm_bench is not a Python package (missing tools/llm_bench/__init__.py), so importing this module will fail. Please switch these imports back to the importable path (likely llm_bench_utils.llm_hook_beam_search...) or add the necessary __init__.py files to make tools.llm_bench a real package.
| cmd: | | ||
|
|
||
| tests/python_tests/samples |
There was a problem hiding this comment.
matrix.test.cmd is defined as a block scalar with blank lines, which will embed newlines into the value. Since this is interpolated into python -m pytest ... ${{ matrix.test.cmd }}, the shell will treat it as multiple commands and likely fail. Use a single-line string like cmd: 'tests/python_tests/samples' (consistent with the other sample entries).
| cmd: | | |
| tests/python_tests/samples | |
| cmd: 'tests/python_tests/samples' |
| timeout: 60 | ||
| - name: 'VLM (qwen3-vl)' | ||
| cmd: | | ||
| python -m pip install transformers==4.57.0 |
There was a problem hiding this comment.
This job installs transformers==4.57.0 twice (once alone, then again together with optimum-intel). The first install is redundant and increases CI time; keep only the combined install line.
| python -m pip install transformers==4.57.0 |
| # Assisted generation completes the prefill stage in candidate generator so that | ||
| # we don't have several `prefill` calls in one generation loop. Skip `_prefill` for assistants | ||
| if not generation_config.is_assistant: | ||
| outputs = self._prefill(input_ids, generation_config, model_kwargs) | ||
| prefill_consumed = False | ||
| else: | ||
| model_kwargs = self._get_initial_cache_position(input_ids.shape[1], input_ids.device, model_kwargs) | ||
| prefill_consumed = True | ||
|
|
||
| while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): | ||
| if prefill_consumed: | ||
| model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) | ||
| prefill_consumed = True | ||
| model_kwargs = self._update_model_kwargs_for_generation( | ||
| outputs, | ||
| model_kwargs, | ||
| is_encoder_decoder=self.config.is_encoder_decoder, | ||
| ) | ||
| if synced_gpus and this_peer_finished: | ||
| continue | ||
|
|
||
| # Copy is needed to avoid keeping a hanging ref to outputs.logits which may be very large for first iteration | ||
| # (the clone itself is always small) | ||
| next_token_logits = outputs.logits[:, -1, :].to(copy=True, dtype=torch.float32, device=input_ids.device) | ||
|
|
||
| # pre-process distribution | ||
| next_token_scores = logits_processor(input_ids, next_token_logits) |
There was a problem hiding this comment.
This transformers v5 _sample hook no longer records per-token latency/infer timing into hook_greedy.tm_list / hook_greedy.tm_infer_list (unlike the v4.x hook implementations). As a result, llm_bench metrics that depend on these lists will be empty/incorrect for sampling on transformers v5. Please add the same timing collection used in hook_sample_v4_57.py (or adjust the benchmark to not expect these lists for v5).
| prefill_consumed = False | ||
| outputs = self._prefill( | ||
| input_ids, | ||
| generation_config, | ||
| model_kwargs, | ||
| is_first_iteration=not generation_config.is_assistant, | ||
| ) | ||
|
|
||
| while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): | ||
| if prefill_consumed: | ||
| next_sequence_length = 1 if model_kwargs["use_cache"] else None | ||
| model_inputs = self.prepare_inputs_for_generation( | ||
| input_ids, next_sequence_length=next_sequence_length, **model_kwargs | ||
| ) | ||
| with self._optimize_model_for_decode(): | ||
| outputs = model_forward(**model_inputs, return_dict=True) | ||
| prefill_consumed = True | ||
| model_kwargs = self._update_model_kwargs_for_generation( | ||
| outputs, | ||
| model_kwargs, | ||
| is_encoder_decoder=self.config.is_encoder_decoder, | ||
| ) | ||
| if synced_gpus and this_peer_finished: | ||
| continue | ||
|
|
||
| # Copy is needed to avoid keeping a hanging ref to outputs.logits which may be very large for first iteration | ||
| # (the clone itself is always small) | ||
| next_token_logits = outputs.logits[:, -1, :].to(copy=True, dtype=torch.float32, device=input_ids.device) | ||
|
|
||
| # pre-process distribution |
There was a problem hiding this comment.
Same issue as hook_sample_v5.py: this transformers v5.3+ _sample hook does not append step timings to hook_greedy.tm_list / hook_greedy.tm_infer_list, so llm_bench token latency metrics for sampling will be missing/incorrect on transformers v5.3+.
| cmd: | | ||
|
|
||
| tests/python_tests/samples |
There was a problem hiding this comment.
The multiline cmd: | value includes a leading blank line, so matrix.test.cmd will contain a newline. When interpolated into the python -m pytest ... ${{ matrix.test.cmd }} command, this can split the shell command and fail the job. Use a single-line string like the other entries (e.g. cmd: 'tests/python_tests/samples').
| cmd: | | |
| tests/python_tests/samples | |
| cmd: 'tests/python_tests/samples' |
| # optimum-intel[nncf]>=1.25.0,<=1.27.0 | ||
| # git+https://github.com/huggingface/optimum-intel.git@423b4237df4d51efe077675eb1c531ffe4c82c83#egg=optimum-intel | ||
| git+https://github.com/huggingface/optimum-intel.git@refs/pull/1692/head#egg=optimum-intel | ||
| packaging>=20.0,<=26.0 |
There was a problem hiding this comment.
Using a moving git ref (refs/pull/1692/head) makes builds non-reproducible and can break CI at any time if the PR branch changes or is deleted. Prefer pinning to an immutable commit SHA (or a released version) for optimum-intel.
| sentence-transformers>=2.2.2,<=5.3.0 | ||
| openvino-genai | ||
| optimum-intel[nncf]>=1.19.0,<=1.27.0 | ||
| optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@refs/pull/1692/head |
There was a problem hiding this comment.
This requirements entry pins optimum-intel to a mutable PR ref (refs/pull/1692/head). For stable/reproducible installs (especially in CI), pin to a specific commit SHA or a released version instead.
Description
CVS-###
Fixes #(issue)
Checklist: