Skip to content

fix: enable token usage tracking and configurable stream timeout for Ollama provider#8493

Open
kasjens wants to merge 7 commits intoaaif-goose:mainfrom
kasjens:main
Open

fix: enable token usage tracking and configurable stream timeout for Ollama provider#8493
kasjens wants to merge 7 commits intoaaif-goose:mainfrom
kasjens:main

Conversation

@kasjens
Copy link
Copy Markdown

@kasjens kasjens commented Apr 12, 2026

Summary

Three related fixes for the Ollama provider:

  1. Token usage tracking: The provider was unconditionally stripping stream_options: {"include_usage": true} from requests (fix: prevent Ollama provider from hanging on tool-calling requests #7723), preventing Ollama from returning token counts in streaming responses. This is now gated behind OLLAMA_STREAM_USAGE (default: true) so modern Ollama builds get usage tracking while older builds can opt out with OLLAMA_STREAM_USAGE=false. Invalid values are handled safely — a warning is logged and stream_options is disabled.

  2. Fallback usage parsing: Added fallback for Ollama-native token fields (prompt_eval_count, eval_count) in get_usage(), so token tracking works even when Ollama doesn't translate to standard OpenAI field names (prompt_tokens, completion_tokens). OpenAI fields take precedence when both are present. Null OpenAI fields (e.g. "completion_tokens": null) correctly fall through to the Ollama-native fields instead of silently dropping usage.

  3. Configurable stream timeout: The hardcoded 30s per-chunk timeout was too aggressive for slower models (CPU inference, large parameter counts, complex reasoning). The timeout is now configurable via a resolution chain: OLLAMA_STREAM_TIMEOUT > GOOSE_STREAM_TIMEOUT > OLLAMA_TIMEOUT > default (120s). Zero values are treated as invalid and skipped to prevent immediate stall errors on every chunk.

Testing

  • All 29 related tests pass (zero failures)
  • 6 timeout resolution tests: verify the fallback chain, default behavior, and zero-value rejection
  • 3 usage parsing tests: verify Ollama-native field fallback, OpenAI field precedence, and null OpenAI field fallthrough
  • 2 stream_options gating tests: verify OLLAMA_STREAM_USAGE default-on and opt-out behavior

Related Issues

Relates to #8479
Relates to #8476
Relates to #7723

kasjens added 7 commits April 11, 2026 17:09
…rue) to enable token usage tracking while allowing older Ollama builds to opt out
…ng stream_options instead of silently defaulting to enabled
…back usage parsing for Ollama-native token fields (prompt_eval_count/eval_count)
…ive token counters (prompt_eval_count/eval_count)
Copy link
Copy Markdown
Member

@jamadeo jamadeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, thank you @kasjens . Is it possible we can avoid the config parameter by looking at the ollama version?

input_limit.or(model_config.context_limit)
}

fn resolve_ollama_stream_usage() -> bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do without a config flag? Maybe https://docs.ollama.com/api-reference/get-version and flag this on a minimum version?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jamadeo! I considered that but went with the config flag because:

  1. Ollama isn't always direct — users behind proxies or compatible API servers (LiteLLM, LocalAI) may not expose /api/version.
  2. Version ≠ capability — custom builds/forks may support stream_options without a recognizable version string.
  3. It defaults to enabled — so modern installs work out of the box. Only users on older builds need to set OLLAMA_STREAM_USAGE=false.

Happy to add version detection as a best-effort first pass with the config flag as fallback if you'd prefer that approach!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, and seems like most of the time you'd never add it at all

Copy link
Copy Markdown
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth having in, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants