Skip to content

NGINX by default buffers the response from llama-swap #236

@BobbyL2k

Description

@BobbyL2k

Describe the bug
llama-swap has some weird interaction with NGINX.

Normally without configuring NGINX proxy_buffering off;, NGINX correctly proxies the response without buffering when serving response directly from llama-server (llama.cpp).

However, when the response is proxied through llama-swap back to NGINX, it decided to buffer the SSE. This causes stream=true requests to stutter.

There are two fixes, the NGINX deployment set proxy_buffering off;.
Or llama-swap must supply the response "X-Accel-Buffering"="no" to stop NGINX from buffering.

I propose adding w.Header().Set("X-Accel-Buffering", "no") to ProxyRequest in process.go.

Operating system and version

  • OS: linux
  • GPUs: no applicable

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions