Skip to content

Expand WhisperPipeline performance metrics and add support in llm_bench #3320

@sbalandi

Description

@sbalandi

llm_bench is the tool for performance estimation of optimum-intel and GenAI pipelines.
Enable llm_bench to report separate performance metrics for different WhisperPipeline stages. Some performance metrics will need to be implemented first.

What to do

  1. Add WhisperPipeline specific performance metrics encode inference durations, decode inference durations to WhisperRawPerfMetrics and corresponding encode inference duration, decode inference duration to WhisperPerfMetrics.
  2. Add perfromance metrics for sampling stage to common PerfMetrics.
    Take the llama.cpp approach, when adding metrics for sampling. If you have any other ideas or implementation examples, please, describe them and let's discuss it. Implement collection of that metrics for the WhisperPipeline, LLMPipeline (page attention, SDPA, static).
  3. Add support for reporting the following performance metrics to the llm_bench for Whisper pipeline:
  • Fot the first token:
    • Tokenization step
    • Feature extraction
    • Encode
    • Decode
    • Sampling
  • For second token:
    • Generate decode
    • Sampling
  • For other token:
    • Generate decode
    • Sampling
  • Detokenization step

AI notice — Important! We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

In Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions