Operator enhancement to poll for updates to operands based on floating tag by phantomjinx · Pull Request #174 · hawtio/hawtio-operator

phantomjinx · 2026-04-21T17:35:31Z

Description

Current behaviour of the operator ensures that should operand images receive new releases on a specific floating tag, the operator is capable of installing new incremental image versions rather than being tied to the original product release. Since the operator installs the operand using floating tags, any update to the tag will mean the operator pulling the latest image on that tag and installing it.

However, this does not solve the issue of existing hawtio-online installs being able to receive the updated operands. Only if the operator removes the existing installs and then re-deploys will the new images likely be pulled and used. To solve this issue, the following is implemented:

The operator launches a background go worker which polls the image registry for the given hawtio-online and hawtio-online-gateway images. It fetches the remote digest of each image tag.
The digests are compared to the existing digests and if they are different then the updater makes them available and populates a go channel.
The operator is configured to watch the updater's channel and should it receive a notification, immediately begins an iteration of its reconciliation loop.
Whilst reconciling the deployment, the operator checks the digests received from the updater and populates them in both an annotation in the deployment resource spec and modifies the image urls.
The modification to the deployment is enough for Kubernetes to restart a rollout of the hawtio-online deployment, fetching the new images and spinning them up.
The reconciliation completes by populating the status of each Hawtio CR with the new image urls.
Should the updater fail to access the registry, eg. offline install, then it will gracefully back off and leave the original working image reference intact in the deployment resource.
If users wish to complete disable the updater, eg. offline install / air-gapped network, then the environment variable UPDATE_POLLING_INTERVAL can be added to the operator deployment with a value of "0", disabling the updater entirely.
The updater has a default schedule of every 12 hours ("12h"). This can be modified using the UPDATE_POLLING_INTERVAL, eg. "6h", "180h" etc...

squakez

I had a quick look and overall, technically speaking I don't see any problem. However, I think that in the long run this one may introduce maintainability problems in the sense that it expands a lot the scope and the surface areas for bugs/thread consistency problems.

I think that the "restart" of an application due by a floating tag usage which may be regenerated belongs more to the cluster itself than to an operator. Even more, considering that the default polling is 12 hours, at this stage it would be much more easy to just restart the application with pullPolicy=Always every 12 hours and let the cluster pick any new image (if it exist any new one).

phantomjinx · 2026-04-22T08:30:52Z

I had a quick look and overall, technically speaking I don't see any problem. However, I think that in the long run this one may introduce maintainability problems in the sense that it expands a lot the scope and the surface areas for bugs/thread consistency problems.

I think that the "restart" of an application due by a floating tag usage which may be regenerated belongs more to the cluster itself than to an operator. Even more, considering that the default polling is 12 hours, at this stage it would be much more easy to just restart the application with pullPolicy=Always every 12 hours and let the cluster pick any new image (if it exist any new one).

Thanks for the review and for looking into the architecture. Totally understand the concern about keeping the Operator's scope tight.

I did consider the pullPolicy: Always & scheduled restart approach, but I decided against it primarily due to Pod churn and customer experience. Blindly restarting the application every 12 hours - even when no new image exists - forces unnecessary downtime, breaks active connections, and creates noise in customer alerting systems.

By polling the registry digest in the background, we only trigger a rolling deployment when a new image is physically detected (which could be months away). Therefore, the updater presence is quiet for the vast majority of the time and the upgrade when it does occur is seamless.

Regarding thread safety, the updater is strictly isolated. It was an early consideration to have the updater modifying the deployment but that was quickly revised. It makes no modification to cluster state directly; it simply drops a GenericEvent into the standard controller-runtime channel, letting the native Reconciler queue handle the concurrency safely. The importance of the E2E suite was specifically to cover instances of network failures, partial updates, and race conditions to guarantee the updater fails gracefully and allow the reconciler to continue.

The level 2 (seamless upgrades) requirement of the Operator framework is to encapsulate domain-specific lifecycle management so the user doesn't have to. This operator is providing a premium, zero-config experience out of the box.

* This variable name collides with common package names and tends to force go-imports to import a log package rather than detecting the instance variable. * Renaming the variable to an alternative name not associated with common package names stops this happening.

* breaks out all the reconcile functions into separate files to aid reading and maintainability.

* Queries an image url for the latest digest * Adds dependencies to vendoring

* Provides the operator with the UPDATE_POLLING_INTERVAL env var * If added to the deployment with a value of 0 then the update poller will be completely disabled.

… object

…o controller * hawtio_controller.go * Adds a watch on the update channel. If the channel signals there is new data then the reconcile loop will be executed. * Adds both the channel and the poller to the ReconcileHawtio object to make them available to the deployment reconciler. * lifecycle.go * Improves handleResultAndError to allow for a requeue of the reconciler if the reconcile functions require it - requires a RequeueError * reconcile_deployment.go * If the updatePoller has been initialized then fetch the digests * Should the digests not be returned yet, requeue and await the response * Should the poller have errored then ignore and continue with the original image urls * manager.go * Creates the update poller and channel for the background thread * poller.go * The poller that runs in the background thread and checks the image digests at the interval specified

* Tests the updater integrated with the hawtio controller and its effect on reconciling the deployment.

* Prevents the test API being public in the project as no need to expose it * test_functions.go * Adds a FindProjectRoot function that walks up the directory structure to locate go.mod so avoiding the need to add copious .. paths when locating the CRD files in the tests.

* Makes the code obvious as to what the default polling interval of the updater is set to.

phantomjinx requested review from grgrzybek, joshiraezcode, jsolovjo, kariuwu and tadayosi as code owners April 21, 2026 17:35

squakez reviewed Apr 22, 2026

View reviewed changes

Comment thread cmd/manager/main.go Outdated

Comment thread pkg/controller/internal/hawtiotest/test_functions.go

phantomjinx force-pushed the image-updater branch from 57f166c to 49ddb19 Compare April 22, 2026 08:57

phantomjinx requested a review from squakez April 22, 2026 10:17

phantomjinx force-pushed the image-updater branch 2 times, most recently from 0ef117e to 8c1fa32 Compare April 23, 2026 11:50

phantomjinx force-pushed the main branch from ff67790 to a9bbee9 Compare April 23, 2026 12:02

phantomjinx added 9 commits April 23, 2026 13:03

HAWNG-1855: Refactor hawtio_controller to break-out functions

7532168

* breaks out all the reconcile functions into separate files to aid reading and maintainability.

HAWNG-1855: Adds updater worker foundational function

61bdd04

* Queries an image url for the latest digest * Adds dependencies to vendoring

HAWNG-1855: Exposes the update polling interval env var

a853192

* Provides the operator with the UPDATE_POLLING_INTERVAL env var * If added to the deployment with a value of 0 then the update poller will be completely disabled.

HAWNG-1855: Adds the gateway image to the print columns of the status…

9292e33

… object

HAWNG-1855: Adds integration tests for the updater polling

2277a04

* Tests the updater integrated with the hawtio controller and its effect on reconciling the deployment.

HAWNG-1855: Makes the default polling interval more prominent

6c771f0

* Makes the code obvious as to what the default polling interval of the updater is set to.

phantomjinx force-pushed the image-updater branch from 8c1fa32 to 6c771f0 Compare April 23, 2026 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator enhancement to poll for updates to operands based on floating tag#174

Operator enhancement to poll for updates to operands based on floating tag#174
phantomjinx wants to merge 9 commits intohawtio:mainfrom
phantomjinx:image-updater

phantomjinx commented Apr 21, 2026

Uh oh!

squakez left a comment

Uh oh!

Uh oh!

Uh oh!

phantomjinx commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phantomjinx commented Apr 21, 2026

Description

Uh oh!

squakez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

phantomjinx commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants