-
Notifications
You must be signed in to change notification settings - Fork 222
NGINX by default buffers the response from llama-swap #236
Copy link
Copy link
Closed as not planned
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
llama-swap has some weird interaction with NGINX.
Normally without configuring NGINX proxy_buffering off;, NGINX correctly proxies the response without buffering when serving response directly from llama-server (llama.cpp).
However, when the response is proxied through llama-swap back to NGINX, it decided to buffer the SSE. This causes stream=true requests to stutter.
There are two fixes, the NGINX deployment set proxy_buffering off;.
Or llama-swap must supply the response "X-Accel-Buffering"="no" to stop NGINX from buffering.
I propose adding w.Header().Set("X-Accel-Buffering", "no") to ProxyRequest in process.go.
Operating system and version
- OS: linux
- GPUs: no applicable
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working