llm_bench is the tool for performance estimation of optimum-intel and GenAI pipelines.
Enable llm_bench to report separate performance metrics for different WhisperPipeline stages. Some performance metrics will need to be implemented first.
What to do
- Add
WhisperPipeline specific performance metrics encode inference durations, decode inference durations to WhisperRawPerfMetrics and corresponding encode inference duration, decode inference duration to WhisperPerfMetrics.
- Add perfromance metrics for sampling stage to common PerfMetrics.
Take the llama.cpp approach, when adding metrics for sampling. If you have any other ideas or implementation examples, please, describe them and let's discuss it. Implement collection of that metrics for the WhisperPipeline, LLMPipeline (page attention, SDPA, static).
- Add support for reporting the following performance metrics to the llm_bench for Whisper pipeline:
- Fot the first token:
- Tokenization step
- Feature extraction
- Encode
- Decode
- Sampling
- For second token:
- For other token:
- Detokenization step
AI notice — Important! We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.
llm_bench is the tool for performance estimation of optimum-intel and GenAI pipelines.
Enable llm_bench to report separate performance metrics for different WhisperPipeline stages. Some performance metrics will need to be implemented first.
What to do
WhisperPipelinespecific performance metricsencode inference durations,decode inference durationsto WhisperRawPerfMetrics and correspondingencode inference duration,decode inference durationto WhisperPerfMetrics.Take the llama.cpp approach, when adding metrics for sampling. If you have any other ideas or implementation examples, please, describe them and let's discuss it. Implement collection of that metrics for the WhisperPipeline, LLMPipeline (page attention, SDPA, static).