Expand WhisperPipeline performance metrics and add support in llm_bench

[llm_bench](https://github.com/openvinotoolkit/openvino.genai/tree/master/tools/llm_bench) is the tool for performance estimation of optimum-intel and GenAI pipelines.
Enable llm_bench to report separate performance metrics for different [WhisperPipeline](https://github.com/openvinotoolkit/openvino.genai/tree/master/src/cpp/src/whisper) stages. Some performance metrics will need to be implemented first.

**What to do**
1. Add `WhisperPipeline` specific performance metrics `encode inference durations`, `decode inference durations` to [WhisperRawPerfMetrics](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/include/openvino/genai/whisper_pipeline.hpp#L22)  and corresponding `encode inference duration`, `decode inference duration` to [WhisperPerfMetrics](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/include/openvino/genai/whisper_pipeline.hpp#L28).
2. Add perfromance metrics for [sampling stage](https://github.com/openvinotoolkit/openvino.genai/tree/master/src/cpp/src/sampling) to common [PerfMetrics](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/include/openvino/genai/perf_metrics.hpp).
Take the [llama.cpp](https://github.com/ggml-org/llama.cpp) approach, when adding metrics for sampling. If you have any other ideas or implementation examples, please, describe them and let's discuss it. Implement collection of that metrics for the [WhisperPipeline](https://github.com/openvinotoolkit/openvino.genai/tree/master/src/cpp/src/whisper), LLMPipeline ([page attention](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/src/continuous_batching/pipeline_impl.cpp), [SDPA](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/src/lm_encoding.cpp#L175), [static](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/src/llm/pipeline_static.cpp)).
4. Add support for reporting the following performance metrics to the [llm_bench for Whisper pipeline](https://github.com/openvinotoolkit/openvino.genai/blob/master/tools/llm_bench/task/speech_to_text_generation.py#L24):

- Fot the first token:
  - Tokenization step
  - Feature extraction
  - Encode
  - Decode
  - Sampling
- For second token:
  - Generate decode
  - Sampling
- For other token:
  - Generate decode
  - Sampling
- Detokenization step

> AI notice — Important! We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand WhisperPipeline performance metrics and add support in llm_bench #3320

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expand WhisperPipeline performance metrics and add support in llm_bench #3320

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions