Skip to content

enhance: Use milvusStorageV2 to support various cloudstorage#67

Open
tinswzy wants to merge 11 commits intomasterfrom
support_milvus_storage
Open

enhance: Use milvusStorageV2 to support various cloudstorage#67
tinswzy wants to merge 11 commits intomasterfrom
support_milvus_storage

Conversation

@tinswzy
Copy link
Copy Markdown
Collaborator

@tinswzy tinswzy commented Dec 23, 2025

issue: ##64

!!!!!! DO NOT MERGE: Wait for the storage v2 to be restructured and improved, especially after the FFI interface part is modified and stabilized, then update the dependency calls and remove the useless cpp code

@tinswzy tinswzy force-pushed the support_milvus_storage branch 3 times, most recently from 0bd89fc to 718577e Compare December 24, 2025 11:30
@czs007
Copy link
Copy Markdown
Collaborator

czs007 commented Jan 18, 2026

/gemini review

tinswzy added 9 commits March 23, 2026 17:27
fix: improve membership discovery reliability and lifecycle error handling                                                                                                                                                          
                                                                                                                                                                                                                                      
  - Prevent stale leave events from removing rejoined nodes in membership discovery
  - Propagate MarkDecommissioned error and retry in decommission monitor                                                                                                                                                              
  - Relax fallback mechanism performance threshold to reduce CI flakiness                                                                                                                                                             
  - Add comprehensive tests for membership, lifecycle, and client components
Add a Kubernetes operator for deploying and managing Woodpecker
  service-mode clusters with full lifecycle management, plus parity
  improvements to the docker-compose deployment.

  Operator (deployments/operator/)
  - CRD (WoodpeckerCluster) for declarative cluster deployment
  - Controllers for StatefulSet (with gossip seed auto-computation and
    liveness/readiness/startup probes), headless/client/metrics Services,
    ConfigMap, ServiceAccount, and PodDisruptionBudget
  - Rolling upgrade on image change, rolling restart on config change
  - Graceful scale-down driven by the server /admin/node/decommission API
    with progress polled via /admin/node/decommission/progress
  - Finalizer for graceful cleanup on CR deletion
  - Defaulting and validation webhooks; node topology awareness (AZ/Region)

  Gossip / pod identity
  - All pods share the same seed list (server-0,1,2); gossip ignores
    self-seeds, so every node can bootstrap independently. Previously
    server-0 had no seeds and could not join the cluster.
  - NODE_NAME uses pod name (metadata.name) instead of K8s node name,
    so single-node clusters like minikube work correctly.
  - Add ADVERTISE_GOSSIP_ADDR/ADVERTISE_SERVICE_ADDR with pod FQDN via
    headless service DNS; fix env var ordering for kubelet expansion.
  - Main container sources /etc/woodpecker/topology.env written by the
    init container so SEEDS is picked up before exec.

  Tests
  - New tests/e2e_operator/ client test suite running inside a K8s pod,
    including TestOperatorE2E_ScaleDownWithDecommission which drives the
    full decommission lifecycle: write + truncate on every existing log,
    wait for safe_to_terminate, verify read-back.
  - New deployments/operator/test/smoke-test.sh supporting step-by-step
    execution (step1-9, clean, all, --no-cleanup), with gossip health
    verification in step6 and decommission e2e in step9.
  - Rename test/e2e/ to test/operator-integration/ to distinguish kind-based
    integration tests from the minikube smoke test.
  - Move retentionPolicy.ttl under woodpecker.logstore in client test
    configs so the auditor uses the 10s TTL (was silently defaulting to
    72h because of wrong YAML nesting).
  - Fix decommission test flake: write one extra message past the
    truncation point so the auditor cleans the last segment on the
    decommissioned node — cleanup logic skips segId >= truncatedSegmentId,
    so truncating at the last written entry leaves its segment forever.

  CI
  - New .github/workflows/operator-test.yaml; add "Operator Test" to
    REQUIRED_CHECKS in all CI workflows so ci-passed requires it.
  - Add /rerun-operator slash command and an on-failure PR comment with
    rerun instructions.
  - Exclude all tests/ from the unit-test workflow (e2e_operator needs
    a K8s cluster and config file).

  docker-compose parity
  - Add HTTP /healthz healthchecks to all four woodpecker-nodeX services
    in deployments/docker-compose.yaml (interval 10s, timeout 5s,
    retries 3, start_period 30s), mirroring the operator's liveness probe
    so docker-compose deployments get the same hung-process detection.
    Verified end-to-end: all four nodes reach docker health=healthy
    within ~50s; /healthz returns 200 on 9091-9094.

  Docs
  - docs/woodpecker_operator.md — operator design
  - docs/admin-guides/operator-guide-getting-start.md — pre-built image guide
  - docs/admin-guides/operator-e2e-build-from-source.md — build-from-source E2E guide

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
…the storageV2 repo

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
…that depend on the latest v2

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
…esource

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
@tinswzy tinswzy force-pushed the support_milvus_storage branch from fdfb227 to 46e9044 Compare April 8, 2026 07:08
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Go Mod Tidy Check failed. Comment /rerun-mod-tidy to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Chaos Test failed. Comment /rerun-chaos-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Lint failed. Comment /rerun-lint to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

E2E Object Storage failed. Comment /rerun-e2e-objectstorage to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

E2E Local failed. Comment /rerun-e2e-local to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Component Integration Test failed. Comment /rerun-component-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

E2E Service failed. Comment /rerun-e2e-service to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Unit Test failed. Comment /rerun-unit-test to rerun, or /rerun to rerun all failed checks.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
@github-actions
Copy link
Copy Markdown

Chaos Test failed. Comment /rerun-chaos-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

Unit Test failed. Comment /rerun-unit-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

Lint failed. Comment /rerun-lint to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

E2E Local failed. Comment /rerun-e2e-local to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

Component Integration Test failed. Comment /rerun-component-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

E2E Service failed. Comment /rerun-e2e-service to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

E2E Object Storage failed. Comment /rerun-e2e-objectstorage to rerun, or /rerun to rerun all failed checks.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
@github-actions
Copy link
Copy Markdown

Chaos Test failed. Comment /rerun-chaos-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

Lint failed. Comment /rerun-lint to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

Unit Test failed. Comment /rerun-unit-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

E2E Object Storage failed. Comment /rerun-e2e-objectstorage to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

E2E Service failed. Comment /rerun-e2e-service to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

Component Integration Test failed. Comment /rerun-component-test to rerun, or /rerun to rerun all failed checks.

@github-actions
Copy link
Copy Markdown

E2E Local failed. Comment /rerun-e2e-local to rerun, or /rerun to rerun all failed checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants