Skip to content

fix(nginx): Fix nginx startup loop by replacing curl health check with wget (closes #9794)#9801

Open
janhavitupe wants to merge 4 commits intoonyx-dot-app:mainfrom
janhavitupe:fix/nginx-healthcheck-wget
Open

fix(nginx): Fix nginx startup loop by replacing curl health check with wget (closes #9794)#9801
janhavitupe wants to merge 4 commits intoonyx-dot-app:mainfrom
janhavitupe:fix/nginx-healthcheck-wget

Conversation

@janhavitupe
Copy link
Copy Markdown

@janhavitupe janhavitupe commented Mar 31, 2026

Summary

Fixes an issue where nginx gets stuck in an infinite startup loop due to failed health checks when using curl inside the Alpine-based container.

Related Issue

Closes #9794

Problem

The nginx startup script performs a health check against the API server using curl. In some environments, this consistently returns HTTP 000, causing nginx to retry indefinitely:


Summary by cubic

Replaced API health check curl with wget and robust status parsing to stop the nginx startup loop on Alpine (Docker DNS returning HTTP 000). Closes #9794.

Written for commit 9b360a8. Summary will update on new commits.

@janhavitupe janhavitupe requested a review from a team as a code owner March 31, 2026 14:19
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR fixes a nginx startup loop on Alpine-based containers by replacing curl with BusyBox wget for the API server health check in deployment/data/nginx/run-nginx.sh. The root cause was curl returning HTTP 000 due to Docker's internal DNS resolver on Alpine, causing the health-check loop to never succeed.

Key changes:

  • Replaces curl -o /dev/null -s -w "%{http_code}" with wget -S -q -O /dev/null … 2>&1 | awk '/HTTP\//{print $2; exit}'
  • Adds a status_code="${status_code:-000}" fallback for complete connection failures (no HTTP line in output)
  • The exit in the awk program ensures only the first HTTP/ line is used, which is the response-header line — this correctly handles the two-line stderr output BusyBox wget emits for non-2xx responses (header line + error message line), preventing spurious shell arithmetic errors from a non-numeric $2
  • wget (BusyBox) is available by default on Alpine, so no new dependency is introduced

Confidence Score: 5/5

  • Safe to merge — the fix is minimal, correctly scoped, and handles all error paths.
  • All paths are handled: 200 breaks the loop, non-2xx retries, and connection failures fall back to "000" and retry. The awk exit resolves the previously flagged two-line stderr issue. No P0 or P1 issues found.
  • No files require special attention.

Important Files Changed

Filename Overview
deployment/data/nginx/run-nginx.sh Replaces curl with wget for the API health check to fix Alpine DNS resolution issues; adds exit in awk to correctly take the first HTTP/ match and a 000 fallback for unreachable hosts.

Sequence Diagram

sequenceDiagram
    participant S as run-nginx.sh
    participant W as wget (BusyBox)
    participant A as API Server :8080/health
    participant N as nginx

    loop Until status 200
        S->>W: wget -S -q -O /dev/null http://api:8080/health 2>&1
        W->>A: GET /health
        alt API ready (200)
            A-->>W: HTTP/1.1 200 OK
            W-->>S: stderr: "  HTTP/1.1 200 OK"
            S->>S: awk extracts "200", exits on first match
            S->>N: Start nginx (daemon off)
        else API not ready (non-2xx)
            A-->>W: HTTP/1.1 503 ...
            W-->>S: stderr: "  HTTP/1.1 503 ..." (first line, exit)
            S->>S: status_code = "503", sleep 5s
        else Connection failure
            W-->>S: (no HTTP/ line in stderr)
            S->>S: status_code = "" → "000", sleep 5s
        end
    end
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/nginx-healt..." | Re-trigger Greptile


# Use wget to send a request and capture the HTTP status code
# (curl has DNS resolution issues with Docker's internal resolver on Alpine)
status_code=$(wget -S -q -O /dev/null "http://${ONYX_BACKEND_API_HOST}:8080/health" 2>&1 | awk '/HTTP\//{print $2}' | tail -1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 awk pattern can extract wrong field on non-2xx responses

When BusyBox wget receives a non-2xx HTTP response (e.g., 503 while the API server is still booting), it prints two lines to stderr that both match /HTTP\//:

  1. The response header: HTTP/1.1 503 Service Unavailable$2 = "503"
  2. The error message: wget: server returned error: HTTP/1.1 503 Service Unavailable$2 = "server"

Because tail -1 picks the last match, status_code ends up as "server". The subsequent [ "server" -eq 200 ] comparison produces a shell arithmetic error ([: server: integer expression expected) and clutters the logs. The retry still occurs (since the if condition is non-zero), so nginx will still wait correctly — but the error output is noisy.

Using exit after the first HTTP/ match avoids the error message line entirely:

Suggested change
status_code=$(wget -S -q -O /dev/null "http://${ONYX_BACKEND_API_HOST}:8080/health" 2>&1 | awk '/HTTP\//{print $2}' | tail -1)
status_code=$(wget -S -q -O /dev/null "http://${ONYX_BACKEND_API_HOST}:8080/health" 2>&1 | awk '/HTTP\//{print $2; exit}')
Prompt To Fix With AI
This is a comment left during a code review.
Path: deployment/data/nginx/run-nginx.sh
Line: 44

Comment:
**awk pattern can extract wrong field on non-2xx responses**

When BusyBox `wget` receives a non-2xx HTTP response (e.g., 503 while the API server is still booting), it prints **two** lines to stderr that both match `/HTTP\//`:

1. The response header: `  HTTP/1.1 503 Service Unavailable``$2 = "503"`2. The error message: `wget: server returned error: HTTP/1.1 503 Service Unavailable``$2 = "server"` ✗

Because `tail -1` picks the **last** match, `status_code` ends up as `"server"`. The subsequent `[ "server" -eq 200 ]` comparison produces a shell arithmetic error (`[: server: integer expression expected`) and clutters the logs. The retry still occurs (since the `if` condition is non-zero), so **nginx will still wait correctly** — but the error output is noisy.

Using `exit` after the first `HTTP/` match avoids the error message line entirely:

```suggestion
  status_code=$(wget -S -q -O /dev/null "http://${ONYX_BACKEND_API_HOST}:8080/health" 2>&1 | awk '/HTTP\//{print $2; exit}')
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nginx startup stuck in loop: curl returns 000 resolving 'api_server' on Alpine (wget works)

1 participant