Include usage key in create_completion when streaming

**Is your feature request related to a problem? Please describe.**
Since `create_completion` may yield text chunks comprised of multiple tokens per yield (e.g. in the case of multi-byte Unicode characters), counting the number of yields may not equal the number of tokens actually generated by a model. To accurately get the usage statistics of a streamed completion, one has to run the final text through the tokenizer again, despite `create_completion` already tracking the number of tokens generated by the model.

**Describe the solution you'd like**
When `stream=True` in `create_completion`, the final chunk yielded should include the usage statistics in the `'usage'` key.

**Describe alternatives you've considered**
- Saving full generated text and running it through the tokenizer again (seems wasteful)
- Counting the number of yields and hoping we don't have any multi-byte characters (hacky and fragile)

**Additional context**
The OpenAI API has recently added similar support in their streaming API with the `stream_options` key: https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include usage key in create_completion when streaming #1498

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Include usage key in create_completion when streaming #1498

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions