Replies: 5 comments 1 reply
-
|
I'm facing the exact same issue in a similar scenario. I think it's reasonable to expect the preview service to point back to the stable version when an AnalysisRun fails, especially given the preview usually scales down unless explicitly configured not to. Looking at the commit history and the docs, the current behavior seems to be intentional though (for example, it says "The Rollout always makes sure that the preview service is sending traffic to the newest ReplicaSet" in the BlueGreen docs). Now changing this behaviour would probably be a breaking change so I'm wondering if it might make sense to add an optional Rollout field that could modify this behavior. If a maintainer here is happy with this I'm happy to make a PR. |
Beta Was this translation helpful? Give feedback.
-
|
Could you explain a bit about the use case here and what the business need is behind this? Currently it works like this (or at least this is what I understand)
End of story. Are you saying that after this, you still want to run integration tests to 1.5? Is this your scenario? |
Beta Was this translation helpful? Give feedback.
-
|
In my case I route traffic with a certain header to preview and all others to stable with an Istio VirtualService. This enables us to make preview accessible for specific users (in my case, users from the QA team) before fully promoting. With your example, when the analysis fails in step 4 stable does point back to 1.4 but preview doesn't since its pod hash is not updated. This makes the service inaccessible for those specific users. I'm not sure of the reasoning behind this implementation, but I think it's somewhat unintuitive and would like to know if you know why. Either way, I think it doesn't sound like a bad idea to have an optional field to make preview point back to the old version and would love to hear your perspective (or other approaches to achieve what I'm trying to do if you know any). |
Beta Was this translation helpful? Give feedback.
-
|
Apologies for the late reply, don't really have a github repo with an example but let me try to step through my understanding of the argo rollouts code and explain what happens in my case. In my project's case we have multiple apps running (e.g. frontend, backend) with an integration test that requires all apps' preview service to be running at the same time. Suppose I attempt to deploy v2 backend and there is a bug with our latest deployment and the Pre Promotion Analysis/AnalysisRun/integration test fails. During this attempted deployment, the newest replicaset hash is updated to point to the failed deployment. Once the rollout has been aborted due to the failing AnalysisRun, this new replicaset will have desired replicas set as 0 (since it is a failed deployment). The active service then checks that the rollout status for backend is abort which means it will fallback to using the last known stable replicaset hash (v1) instead of the latest (v2, failing, has 0 replicas). This differs from the preview service which always uses the latest (v2, failing, has 0 replicas). This results in a case where frontend's preview is running but backend's preview service points to a failed replicaset with 0 replicas. This means all subsequent deployments for frontend will always fail because my backend preview service which is necessary for the Pre Promotion Analysis/AnalysisRun/integration test is down. The relevant code snippets for this behavior are shown below: Preview service: argo-rollouts/rollout/service.go Lines 77 to 92 in 7518bde Active service: argo-rollouts/rollout/service.go Lines 94 to 121 in 7518bde Hopefully the code snippets above are enough to illustrate this scenario without an example repo. |
Beta Was this translation helpful? Give feedback.
-
|
We have a similar use case that would greatly benefit from this feature. We maintain an internal E2E testing application that needs to always test against the preview version of the service before promoting to production. Due to external constraints, we cannot modify the E2E application code to explicitly select preview vs. stable versions nor to manipulate new headers. Instead, we use Istio VirtualService header matching ( However, when a rollout is aborted (e.g. manual abort), the previewService selector continues pointing to the aborted ReplicaSet with 0 replicas, causing all E2E requests to fail (503 unavailable). Proposed Solution: An optional field like |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Not sure if this is best suited under discussions or issues but would like to better understand the current behaviour and intended usage of the preview service, in particular for bluegreen rollout aborts.
I have observed that upon making a new deployment, if it fails the AnalysisRun for Pre Promotion Analysis, Argo Rollouts will help to not promote the preview service but it's
rollouts-pod-template-hashwill remain pointing at the now failed hash value instead of pointing to the last stable replicaset's hash (which is how the active service behaves). This appears to be due to the difference in implementation of reconcilePreviewService, which does not contain any status checks for an aborted status like reconcileActiveService.Is this behaviour of remaining at the failed hash an intended behaviour and is there some reason for this? My project currently runs pre-promotion integration tests for a few different apps against the preview services and having preview service selectors that do not exist on any pods will result in test failures.
Beta Was this translation helpful? Give feedback.
All reactions