enhance: Use milvusStorageV2 to support various cloudstorage#67
enhance: Use milvusStorageV2 to support various cloudstorage#67
Conversation
0bd89fc to
718577e
Compare
|
/gemini review |
fix: improve membership discovery reliability and lifecycle error handling
- Prevent stale leave events from removing rejoined nodes in membership discovery
- Propagate MarkDecommissioned error and retry in decommission monitor
- Relax fallback mechanism performance threshold to reduce CI flakiness
- Add comprehensive tests for membership, lifecycle, and client components
Add a Kubernetes operator for deploying and managing Woodpecker
service-mode clusters with full lifecycle management, plus parity
improvements to the docker-compose deployment.
Operator (deployments/operator/)
- CRD (WoodpeckerCluster) for declarative cluster deployment
- Controllers for StatefulSet (with gossip seed auto-computation and
liveness/readiness/startup probes), headless/client/metrics Services,
ConfigMap, ServiceAccount, and PodDisruptionBudget
- Rolling upgrade on image change, rolling restart on config change
- Graceful scale-down driven by the server /admin/node/decommission API
with progress polled via /admin/node/decommission/progress
- Finalizer for graceful cleanup on CR deletion
- Defaulting and validation webhooks; node topology awareness (AZ/Region)
Gossip / pod identity
- All pods share the same seed list (server-0,1,2); gossip ignores
self-seeds, so every node can bootstrap independently. Previously
server-0 had no seeds and could not join the cluster.
- NODE_NAME uses pod name (metadata.name) instead of K8s node name,
so single-node clusters like minikube work correctly.
- Add ADVERTISE_GOSSIP_ADDR/ADVERTISE_SERVICE_ADDR with pod FQDN via
headless service DNS; fix env var ordering for kubelet expansion.
- Main container sources /etc/woodpecker/topology.env written by the
init container so SEEDS is picked up before exec.
Tests
- New tests/e2e_operator/ client test suite running inside a K8s pod,
including TestOperatorE2E_ScaleDownWithDecommission which drives the
full decommission lifecycle: write + truncate on every existing log,
wait for safe_to_terminate, verify read-back.
- New deployments/operator/test/smoke-test.sh supporting step-by-step
execution (step1-9, clean, all, --no-cleanup), with gossip health
verification in step6 and decommission e2e in step9.
- Rename test/e2e/ to test/operator-integration/ to distinguish kind-based
integration tests from the minikube smoke test.
- Move retentionPolicy.ttl under woodpecker.logstore in client test
configs so the auditor uses the 10s TTL (was silently defaulting to
72h because of wrong YAML nesting).
- Fix decommission test flake: write one extra message past the
truncation point so the auditor cleans the last segment on the
decommissioned node — cleanup logic skips segId >= truncatedSegmentId,
so truncating at the last written entry leaves its segment forever.
CI
- New .github/workflows/operator-test.yaml; add "Operator Test" to
REQUIRED_CHECKS in all CI workflows so ci-passed requires it.
- Add /rerun-operator slash command and an on-failure PR comment with
rerun instructions.
- Exclude all tests/ from the unit-test workflow (e2e_operator needs
a K8s cluster and config file).
docker-compose parity
- Add HTTP /healthz healthchecks to all four woodpecker-nodeX services
in deployments/docker-compose.yaml (interval 10s, timeout 5s,
retries 3, start_period 30s), mirroring the operator's liveness probe
so docker-compose deployments get the same hung-process detection.
Verified end-to-end: all four nodes reach docker health=healthy
within ~50s; /healthz returns 200 on 9091-9094.
Docs
- docs/woodpecker_operator.md — operator design
- docs/admin-guides/operator-guide-getting-start.md — pre-built image guide
- docs/admin-guides/operator-e2e-build-from-source.md — build-from-source E2E guide
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
…the storageV2 repo Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
…that depend on the latest v2 Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
…esource Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
fdfb227 to
46e9044
Compare
|
❌ Go Mod Tidy Check failed. Comment |
|
❌ Chaos Test failed. Comment |
|
❌ Lint failed. Comment |
|
❌ E2E Object Storage failed. Comment |
|
❌ E2E Local failed. Comment |
|
❌ Component Integration Test failed. Comment |
|
❌ E2E Service failed. Comment |
|
❌ Unit Test failed. Comment |
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
|
❌ Chaos Test failed. Comment |
|
❌ Unit Test failed. Comment |
|
❌ Lint failed. Comment |
|
❌ E2E Local failed. Comment |
|
❌ Component Integration Test failed. Comment |
|
❌ E2E Service failed. Comment |
|
❌ E2E Object Storage failed. Comment |
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
|
❌ Chaos Test failed. Comment |
|
❌ Lint failed. Comment |
|
❌ Unit Test failed. Comment |
|
❌ E2E Object Storage failed. Comment |
|
❌ E2E Service failed. Comment |
|
❌ Component Integration Test failed. Comment |
|
❌ E2E Local failed. Comment |
issue: ##64
!!!!!! DO NOT MERGE:
Wait for the storage v2 to be restructured and improved, especially after the FFI interface part is modified and stabilized, then update the dependency calls and remove the useless cpp code