Native support for HorizontalPodAutoscaler#469
Native support for HorizontalPodAutoscaler#469nustiueudinastea wants to merge 13 commits intozilliztech:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nustiueudinastea The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
983c971 to
3630077
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #469 +/- ##
==========================================
+ Coverage 76.70% 77.20% +0.50%
==========================================
Files 66 67 +1
Lines 6176 6347 +171
==========================================
+ Hits 4737 4900 +163
- Misses 1176 1179 +3
- Partials 263 268 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds native HorizontalPodAutoscaler (HPA) support to the Milvus operator so it can manage HPAs directly (create/update/delete) and adjust rollout scaling behavior to avoid upgrades getting stuck when replicas are managed by an HPA.
Changes:
- Introduces
spec.components.<component>.hpaAPI (CRD + Go types + deepcopy) and a newReconcileHPAsreconciler. - Updates deployment scaling logic to better handle HPA-enabled components during two-deployment rollouts.
- Adds unit tests and a sample manifest demonstrating native and legacy HPA approaches.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/controllers/milvus.go | Adds ReconcileHPAs to the Milvus reconciliation pipeline. |
| pkg/controllers/hpa.go | Implements HPA create/update/delete reconciliation and spec conversion helpers. |
| pkg/controllers/deploy_ctrl_util.go | Adjusts HPA rollout scaling plan to preserve capacity during rollouts. |
| pkg/controllers/deployment_updater.go | Treats HPA spec as enabling HPA and bootstraps replicas from minReplicas when needed. |
| pkg/controllers/components.go | Adds component helpers for reading HPA spec + generating HPA names. |
| apis/milvus.io/v1beta1/components_types.go | Adds HPASpec and hpa field to components API. |
| apis/milvus.io/v1beta1/zz_generated.deepcopy.go | Adds deepcopy support for the new HPA fields/types. |
| config/crd/bases/milvus.io_milvuses.yaml | Extends Milvus CRD schema with component hpa fields. |
| config/crd/bases/milvus.io_milvusclusters.yaml | Extends MilvusCluster CRD schema with component hpa fields. |
| config/samples/hpa.yaml | Documents native HPA config and a legacy external-HPA example. |
| pkg/controllers/hpa_test.go | Adds unit tests for HPA reconciliation + rollout scaling planning. |
| pkg/controllers/deploy_ctrl_util_test.go | Updates rollout scaling expectations for HPA behavior. |
| pkg/controllers/milvus_test.go | Updates reconcile group size expectation due to new reconciler. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@AlintaLu @LoveEachDay @haorenfsa hey folks, I implemented Copilots feedback, and extended the integration tests to cover the HPA situation. The PR is ready for review. Thanks! |
|
I noticed PR #458 which tackles the same problem but differently. I think the addition of HPA support is more robust because there will not be any conflict between the operator and the external HPA, although both solutions can live along each other. |
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
959a7e9 to
99b4e5c
Compare
|
@nustiueudinastea Yes, I think so, too. Thank you for providing this patch. |
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
|
Tests are now failing but it seems to be a problem related to kind. |
|
@haorenfsa any chance you can take another look at this? |
|
@nustiueudinastea yes. I'm on vacation recently. sry for the delay. |
|
@AlintaLu please help review |
|
Great work @nustiueudinastea, thanks for providing native HPA support! Could you please rebase on main to incorporate the latest changes and re-run CI? The latest GitHub Actions runner image appears to have resolved the Kind compatibility issue. |
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
Signed-off-by: Alex Giurgiu <agiurgiu@slb.com>
|
@AlintaLu @haorenfsa I think it's ready. Take a look, and thanks for your time! |
|
+1 from me — I also need this feature/update. It would be great if this could be merged soon. |
|
@AlintaLu @haorenfsa any chance we can move forward with this or you don't plan to merge it anymore? |
This PR adds native support for HPA resources, managed directly by the operator. I have implemented this feature because when settings the replicas to -1, in combination with a user managed HPA, the operator fails to complete upgrades. This happens because of the dual deployments used for querynodes, where the operator is waiting for one of the deployments/replicasets to be scaled down, which never happens when being managed by the HPA.
So this PR introduces ability for the operator to create and manage HPA's by itself, and correctly manage the upgrade process.
The upgrade process is the following: