[feature:ui] Show upgrade progress for single upgrade operations in real time#320
Conversation
|
Can you please rebase your branch to master? |
|
Can you please clean up the history of your branch? Please make sure:
Imagine that someone else wants to follow along with your work and changes. |
| "data": { | ||
| "type": "batch_status", | ||
| "status": "success", | ||
| "completed": 2, |
There was a problem hiding this comment.
Is it possible to test partial failure? e.g., test what happens if the WebSocket disconnects mid-progress.
| self.assertEqual(response["total"], 2) | ||
|
|
||
| # Close the connection | ||
| await communicator.disconnect() |
There was a problem hiding this comment.
nit for test completeness: Can add an integration test that triggers log_line() in a live upgrade operation and asserts the frontend receives it.
Aryamanz29
left a comment
There was a problem hiding this comment.
Good progress so far - since this is still a WIP, I’ll do a full review once it’s in a ready-for-review state and includes some UI that integrates the WebSocket endpoints.
@youhaveme9 - as we’ve discussed before, once the PR is ready for review, please be sure to include clear steps to test your work, and attach screenshots or a working demo. This really helps make the review and testing process quicker and smoother.
At the moment, I had to make several changes locally just to get things working for testing
| if save: | ||
| self.save() | ||
|
|
||
| publisher = UpgradeProgressPublisher(self.pk) |
There was a problem hiding this comment.
self.pk can be None if the model instance hasn’t been saved yet (i.e., it’s a new object). Also, I believe UpgradeProgressPublisher() expects a non-null operation_id. 🤔
| publisher = UpgradeProgressPublisher(self.pk) | |
| if self.pk: | |
| publisher = UpgradeProgressPublisher(self.pk) | |
| publisher.publish_progress( | |
| {"type": "log", "content": line, "status": self.status} | |
| ) |
There was a problem hiding this comment.
Right!
Will add this check
Thanks
| from .websockets import BatchUpgradeProgressConsumer, UpgradeProgressConsumer | ||
|
|
||
| websocket_urlpatterns = [ | ||
| re_path( | ||
| r"ws/upgrade/(?P<operation_id>[^/]+)/$", | ||
| UpgradeProgressConsumer.as_asgi(), | ||
| ), | ||
| re_path( | ||
| r"ws/batch-upgrade/(?P<batch_id>[^/]+)/$", | ||
| BatchUpgradeProgressConsumer.as_asgi(), | ||
| ), | ||
| ] |
There was a problem hiding this comment.
I would suggest constructing these URL patterns similar to how we do in other modules - for example:
https://github.com/openwisp/openwisp-controller/blob/85eee35ea52c64018c64baa9d6e61507ffdd4831/openwisp_controller/connection/channels/routing.py#L1-L12
Also, let's follow a consistent directory structure:
channels/routing.py, channels/consumer.py, and channels/producer.py, just like we do elsewhere. 🙏
There was a problem hiding this comment.
Also, don’t forget to add these new routes to the main ASGI application: https://github.com/openwisp/openwisp-firmware-upgrader/blob/master/tests/openwisp2/routing.py
2558294 to
bc83046
Compare
Aryamanz29
left a comment
There was a problem hiding this comment.
Good progress @youhaveme9!
I did basic testing of the current implementation and below are my findings: (I will do thorough testing on a VM device in the coming days, which will help me test the core implementation in real-world scenarios)
| if (isUpgradeOperationsAbsent()) { | ||
| location.reload(); | ||
| } |
There was a problem hiding this comment.
Please check the logic of this section, as it's causing the device view page to reload every few seconds, making it inaccessible: http://127.0.0.1:8000/admin/config/device/9dd21417-a805-4b0d-8cc7-0f11ed1b7c90/change/
video-2025-06-29_23.44.29.mp4
There was a problem hiding this comment.
Hey @Aryamanz29 I'm not able to replicate this at all
Can you share your steps to replicate this?
There was a problem hiding this comment.
Not on the latest work - might be fixed on the last changes. For now we're good
There was a problem hiding this comment.
Still seeing this error - the page keeps reloading when there are no 'Recent firmware upgrades' tab present. It stops once firmware upgrades appear. Please investigate!
| </div> | ||
|
|
||
|
|
||
| {% endblock %} No newline at end of file |
| "reconnect_delay": 150, | ||
| "reconnect_retry_delay": 30, | ||
| "reconnect_max_retries": 10, | ||
| "upgrade_timeout": 80, | ||
| "reconnect_delay": 20, | ||
| "reconnect_retry_delay": 5, | ||
| "reconnect_max_retries": 5, | ||
| "upgrade_timeout": 25, |
There was a problem hiding this comment.
Curious to know why this change was made (seems like a one-off for local testing? If yes, can we revert this?)
There was a problem hiding this comment.
I did it for testing
Reverted it back
| except Exception as e: | ||
| logger.error(f"Failed to publish WebSocket messages: {e}") |
There was a problem hiding this comment.
What about more specific error logging to help debug issues?
logger.error(f"Failed to publish WebSocket message for upgrade operation {self.pk}: {e}", exc_info=True)| change_form_template = ( | ||
| "admin/firmware_upgrader/batch_upgrade_operation_change_form.html" | ||
| ) | ||
| device_upgrades_per_page = 20 |
There was a problem hiding this comment.
Let's use similar styling for this new upgrade operations list section (I see every text is grey in color), not similar to the device list view page.
http://127.0.0.1:8000/admin/firmware_upgrader/batchupgradeoperation/afc88d11-5322-433f-b168-ef432b6acf82/change/

http://127.0.0.1:8000/admin/config/device/

1. Autocomplete filters:
Also check the organization filters in mass upgrade operations (their autocomplete search filter). Since we select filters here, can we use those autocomplete search filters as well?
2. Image filter:
Have we also discussed and decided to add a By image filter to BatchUpgradeOperationAdmin as well, along with org and status filters?
3. Total operations text:
I also checked that the batch operations list page has text showing total operations like "15 Mass upgrade operations," which is missing from this new Upgrade Operations list page. Can you please check why this is missing? Can we make it consistent with other list pages?
4. Spinner/retry:
I'm also seeing that when there's a recoverable failure and the backend auto-retries upgrade operation tasks, it adds another retry "spinner" (see below) which has both grey and blue colors. This isn't consistent - as far as I know, in device commands we use grey only. I also feel it's not good UX to show another spinner for backend-driven retries. Let's discuss this in the next meeting. (I prefer only one dynamic element for that section, and since we're adding a progress bar, I would use that only to show backend retries with a number.)
There was a problem hiding this comment.
Hey @Aryamanz29
- Image filter:
Have we also discussed and decided to add a By image filter to BatchUpgradeOperationAdmin as well, along with > org and status filters?
I'm not sure if we finalized this
As far as I remember and from meeting notes it was discussed only to keep 2 filters, by status and by organization for now to avoid overcomplicating.
But we can add it if required
nemesifier
left a comment
There was a problem hiding this comment.
Good progress @youhaveme9!
I have some comments below.
Once this PR is ready, let's not forget to update any screenshots in the docs/ folder to reflect changes done here.
| device_publisher = DeviceUpgradeProgressPublisher(self.device.pk, self.pk) | ||
| device_publisher.publish_log(line_str, status_str) | ||
|
|
||
| except Exception as e: |
There was a problem hiding this comment.
expecting Exception is not a best practice. What could go wrong here? We should narrow down this to specific exception classes.
| # trigger an update on the batch operation | ||
| is_new = self.pk is None | ||
| super().save(*args, **kwargs) | ||
| if not is_new: |
There was a problem hiding this comment.
This doesn't look right here at all. The model shouldn't be concerned with publishing messages to websockets.
We should use signals for this. Check if the built-in signals can help, if not we can define a new one.
| BatchUpgradeProgressConsumer.as_asgi(), | ||
| ), | ||
| re_path( | ||
| r"ws/firmware-upgrader/device/(?P<pk>[^/]+)/$", |
There was a problem hiding this comment.
why does this URL start with firmware-upgrader/ while the other two don't?
I would prefix all of them.
| UpgradeProgressConsumer.as_asgi(), | ||
| ), | ||
| re_path( | ||
| r"ws/batch-upgrade/(?P<batch_id>[^/]+)/$", |
There was a problem hiding this comment.
Make sure the naming used in these URLs is consistent with the naming used in the REST API.
| @@ -0,0 +1,121 @@ | |||
| /* Batch Upgrade Operation CSS */ | |||
There was a problem hiding this comment.
Please remove this comment because the name of the file already says this, redundant comments are not useufl.
| @@ -0,0 +1,94 @@ | |||
| /* Firmware Upgrade Progress Styles */ | |||
| <th>{% trans "DEVICE" %}</th> | ||
| <th>{% trans "STATUS" %}</th> | ||
| <th>{% trans "IMAGE" %}</th> | ||
| <th>{% trans "LAST UPDATED" %}</th> |
| "reconnect_delay": 150, | ||
| "reconnect_retry_delay": 30, | ||
| "reconnect_max_retries": 10, | ||
| "upgrade_timeout": 80, | ||
| "reconnect_delay": 20, | ||
| "reconnect_retry_delay": 5, | ||
| "reconnect_max_retries": 5, | ||
| "upgrade_timeout": 25, |
39c2117 to
d81be9e
Compare
| # Configure separate test database for parallel testing | ||
| if TESTING and "--exclude-tag=no_parallel" not in sys.argv: | ||
| DATABASES["default"]["TEST"] = { | ||
| "NAME": os.path.join(BASE_DIR, "openwisp_firmware_upgrader_tests.db"), | ||
| } | ||
|
|
There was a problem hiding this comment.
@youhaveme9 in our last call we found that having this setting here causes issues in running selenium tests. Do you remember that the server process and django test class had different databases? Why is this code still here?
| browser = "firefox" | ||
| maxDiff = None | ||
| retry_max = 6 | ||
| application = import_string(getattr(settings, "ASGI_APPLICATION")) |
There was a problem hiding this comment.
I don' t think you need to import the application here. And, I also believe this is the reason for the Apps not loaded yet error that you've been encountering.
I made the following changes locally and the selenium tests runs every time
diff --git a/openwisp_firmware_upgrader/tests/test_websocket_selenium.py b/openwisp_firmware_upgrader/tests/test_websocket_selenium.py
index c6410cf..d85c8d6 100644
--- a/openwisp_firmware_upgrader/tests/test_websocket_selenium.py
+++ b/openwisp_firmware_upgrader/tests/test_websocket_selenium.py
@@ -17,7 +17,6 @@ from openwisp_firmware_upgrader.hardware import REVERSE_FIRMWARE_IMAGE_MAP
from openwisp_firmware_upgrader.tests.base import TestUpgraderMixin
from openwisp_firmware_upgrader.websockets import DeviceUpgradeProgressPublisher
from openwisp_utils.tests import SeleniumTestMixin
-
from ..swapper import load_model
Device = swapper.load_model("config", "Device")
@@ -44,8 +43,9 @@ class TestRealTimeWebsockets(
image_type = REVERSE_FIRMWARE_IMAGE_MAP["YunCore XD3200"]
browser = "firefox"
maxDiff = None
- retry_max = 6
- application = import_string(getattr(settings, "ASGI_APPLICATION"))
+ retry_max = 1
+ # application = import_string(getattr(settings, "ASGI_APPLICATION"))
+
def setUp(self):
org = self._get_org()
@@ -112,11 +112,10 @@ class TestRealTimeWebsockets(
return communicator
async def _prepare(self):
- communicator = await self._get_communicator(self.admin_client, self.device.pk)
+ # communicator = await self._get_communicator(self.admin_client, self.device.pk)
path = reverse(
f"admin:{self.config_app_label}_device_change", args=[self.device.pk]
)
-
self.login(username=self.admin.username, password=self.admin_password)
self.open(f"{path}#upgradeoperation_set-group")
@@ -131,7 +130,7 @@ class TestRealTimeWebsockets(
)
)
- return communicator
+ # return communicator
async def test_real_time_progress_updates(self):
"""Test real-time progress updates via websocket"""
@@ -219,10 +218,10 @@ class TestRealTimeWebsockets(
self.assertIn("Upload progress: 75%", log_html)
self._assert_no_js_errors()
- try:
- await communicator.disconnect()
- except (Exception, asyncio.CancelledError):
- pass
+ # try:
+ # await communicator.disconnect()
+ # except (Exception, asyncio.CancelledError):
+ # pass
async def test_real_time_status_change_to_success(self):
"""Test real-time status change from in-progress to success"""
diff --git a/tests/openwisp2/postgresql_settings.py b/tests/openwisp2/postgresql_settings.py
index 04b1de7..fd935cf 100644
--- a/tests/openwisp2/postgresql_settings.py
+++ b/tests/openwisp2/postgresql_settings.py
@@ -8,5 +8,8 @@ DATABASES = {
"PASSWORD": "openwisp2",
"HOST": "127.0.0.1",
"PORT": "5432",
+ "TEST": {
+ "NAME": "test_openwisp2",
+ },
},
}
diff --git a/tests/openwisp2/routing.py b/tests/openwisp2/routing.py
index 0c439e2..9a22998 100644
--- a/tests/openwisp2/routing.py
+++ b/tests/openwisp2/routing.py
@@ -1,28 +1,16 @@
-import os
-
-import django
-from channels.routing import ProtocolTypeRouter, URLRouter
-from django.core.asgi import get_asgi_application
-
-os.environ.setdefault("DJANGO_SETTINGS_MODULE", "tests.openwisp2.settings")
-
-django.setup()
-
from channels.auth import AuthMiddlewareStack
+from channels.routing import ProtocolTypeRouter, URLRouter
from channels.security.websocket import AllowedHostsOriginValidator
+from django.core.asgi import get_asgi_application
from openwisp_controller.routing import get_routes
-from openwisp_firmware_upgrader.routing import (
- get_routes as get_firmware_upgrader_routes,
-)
+from openwisp_firmware_upgrader.routing import get_routes as get_upgrader_routes
application = ProtocolTypeRouter(
{
- "http": get_asgi_application(),
"websocket": AllowedHostsOriginValidator(
- AuthMiddlewareStack(
- URLRouter(get_routes() + get_firmware_upgrader_routes())
- )
+ AuthMiddlewareStack(URLRouter(get_routes() + get_upgrader_routes()))
),
+ "http": get_asgi_application(),
}
)
diff --git a/tests/openwisp2/settings.py b/tests/openwisp2/settings.py
index 9928fee..a726dfc 100644
--- a/tests/openwisp2/settings.py
+++ b/tests/openwisp2/settings.py
@@ -15,11 +15,6 @@ DATABASES = {
}
}
-# Configure separate test database for parallel testing
-if TESTING and "--exclude-tag=no_parallel" not in sys.argv:
- DATABASES["default"]["TEST"] = {
- "NAME": os.path.join(BASE_DIR, "openwisp_firmware_upgrader_tests.db"),
- }
SPATIALITE_LIBRARY_PATH = "mod_spatialite.so"| communicator = WebsocketCommunicator( | ||
| self.application, | ||
| path=f"ws/firmware-upgrader/device/{device_id}/", | ||
| headers=[ | ||
| ( | ||
| b"cookie", | ||
| f"sessionid={session_id}".encode("ascii"), | ||
| ) | ||
| ], | ||
| ) | ||
| return communicator |
There was a problem hiding this comment.
I don't understand the use of communicator here. We are using the browser JS to verify the working of websockets.
|
|
||
| def setUp(self): | ||
| org = self._get_org() | ||
| import uuid |
| @@ -0,0 +1,25 @@ | |||
| # Generated by Django 5.2.4 on 2025-07-24 19:36 | |||
There was a problem hiding this comment.
can you squash this migration with the previous one please? See https://docs.djangoproject.com/en/5.2/topics/migrations/#migration-squashing
There was a problem hiding this comment.
This request hasn't been addressed yet.
| @@ -0,0 +1,3 @@ | |||
| from django.dispatch import Signal | |||
|
|
|||
| firmware_upgrader_log_updated = Signal() | |||
There was a problem hiding this comment.
Please add a docs entry for this.
| setTimeout(function () { | ||
| initializeExistingUpgradeOperations($, false); | ||
| }, 100); | ||
| }, 50); |
There was a problem hiding this comment.
This question remained unanswered, referring to #320 (comment).
nemesifier
left a comment
There was a problem hiding this comment.
It looks a lot better now, thank you @youhaveme9!
Please merge with the target branch (which I updated with the latest master) and resolve conflicts.
Now that the functionality looks good, I have been looking closer at the code and found some points that need to be improved before they can be merged into OpenWISP, please read my comments below.
| @@ -0,0 +1,25 @@ | |||
| # Generated by Django 5.2.4 on 2025-07-24 19:36 | |||
There was a problem hiding this comment.
This request hasn't been addressed yet.
| logger.error( | ||
| f"Runtime error in WebSocket publishing for upgrade operation {instance.pk}: {e}", | ||
| exc_info=True, | ||
| ) |
There was a problem hiding this comment.
Both methods here are related to websockets stuff, I think we should move these two there if possible, the reason is that in receivers.py we sometimes put base logic which is related to models, which over time we've been moving directly to models to make the code more explicit about its relation to core business logic.
This logic here is not core logic, but somewhat UI/API logic, so moving it to models is not suited, but neither is using a generic file like receivers.py which is historically used for model related logic.
Therefore we can move these two methods to websockets.py and converting them as class method on the relevant publisher class they interact with, in the same fashion as we have receivers defined as class methods in the DeviceFirmware model, see:
openwisp-firmware-upgrader/openwisp_firmware_upgrader/base/models.py
Lines 451 to 468 in 05a6a49
I already touched on this topic when reviewing #322 and with my changes in 88b76e2.
Please let's be consistent. We have many modules and a lot of code in OpenWISP, we must strive to follow similar patterns, if we don't do this maintainability will suffer to the point that can bring the project to total collapse, unless you're one of those who believes we're not going to be coding anymore in 5 years becasue AI will do it for us, I am not so optimistic about that 😉.
| sender=FirmwareImage, | ||
| dispatch_uid="firmware_image.auto_add_device_firmwares", | ||
| ) | ||
| post_save.connect( |
There was a problem hiding this comment.
This method is called connect_device_signals and connects a few receivers which are defined as methods of the DeviceFirmware model to key signals.
There's a few inconsistencies:
handle_upgrade_operation_savedis not directly related toDeviceFirmware, therefore it hardly belongs to this method namedconnect_device_signals. Let's create a new method calledconnect_upgrade_signalsand move this logic there.
|
|
||
| try: | ||
| # Publish status update to operation-specific channel | ||
| publisher = UpgradeProgressPublisher(instance.pk) |
There was a problem hiding this comment.
This reciever function can be a class method of UpgradeProgressPublisher.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| @receiver(firmware_upgrader_log_updated) |
There was a problem hiding this comment.
The usage of this decorator pattern is inconsistent with the way we do things, please use the connect_upgrade_signals method I suggested above to connect the receiver to the firmware_upgrader_log_updated signal.
Please move this receiver function as a class method of DeviceUpgradeProgressPublisher, which is consistent with having all signal receiver functions as methods of classes that are directly related to the operations being performed.
| // Initialize websocket connection | ||
| initUpgradeProgressWebSockets($, upgradeProgressWebSocket); | ||
| } | ||
| }, 100); |
There was a problem hiding this comment.
I see two contradictions:
- you have code which checks for the presence of
owControllerApiHostand have defined a fallback in case it's not defined yet - the var
owControllerApiHostis defined in thechange_form.html: https://github.com/openwisp/openwisp-controller/blob/3a45fd9ecc9bc3eaf7e5c8be4f82f573829cdf9d/openwisp_controller/config/templates/admin/config/change_form.html#L19, so it will always execute before this script, see the generated HTML:
If we add to this the fact that using setTimeout for critical code like can potentially hide other issues that can blow up in production, the result is not very appealing and I am not very comfortable in deploying this code to production.
If you remove this, what breaks and why?
| self.publish_progress({"type": "error", "message": error_message}) | ||
|
|
||
| @classmethod | ||
| def handle_upgrade_operation_saved(cls, sender, instance, created, **kwargs): |
There was a problem hiding this comment.
Let's change this to handle_upgrade_operation_post_save.
| self.publish_progress({"type": "error", "message": error_message}) | ||
|
|
||
| @classmethod | ||
| def handle_firmware_upgrader_log_updated(cls, sender, instance, line, **kwargs): |
There was a problem hiding this comment.
Let's change this handle_upgrade_operation_log_updated.
| ("firmware_upgrader", "0010_alter_firmwareimage_file"), | ||
| ("firmware_upgrader", "0011_alter_category_organization"), | ||
| ("firmware_upgrader", "0012_upgradeoperation_progress"), | ||
| ("firmware_upgrader", "0013_alter_upgradeoperation_progress"), |
There was a problem hiding this comment.
No I didn't ask you to squash all the migrations, let's revert this please and do something just make sure that instead of having two migration files we have one.
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| **Path**: | ||
| ``openwisp_firmware_upgrader.signals.firmware_upgrader_log_updated`` |
There was a problem hiding this comment.
| ``openwisp_firmware_upgrader.signals.firmware_upgrader_log_updated`` | |
| ``openwisp_firmware_upgrader.signals.log_updated`` |
|
|
||
| .. include:: /partials/signals-note.rst | ||
|
|
||
| ``firmware_upgrader_log_updated`` |
There was a problem hiding this comment.
| ``firmware_upgrader_log_updated`` | |
| ``log_updated`` |
Aryamanz29
left a comment
There was a problem hiding this comment.
LGTM @youhaveme9 - let's address the remaining review comments and fix the failing CI build, then we're ready to merge! 🚀
| # even if the reconnection failed, | ||
| # the firmware image has been flashed | ||
| installed = True | ||
| # if no exception has been raised, the upgrade was successful |
There was a problem hiding this comment.
this comment looks relevant to me - why did we remove that?
| return obj | ||
|
|
||
|
|
||
| class UpgradeProgressConsumer(AsyncJsonWebsocketConsumer): |
| await self.send_json(event["data"]) | ||
|
|
||
|
|
||
| class BatchUpgradeProgressConsumer(AsyncJsonWebsocketConsumer): |
| await self.send_json(data) | ||
|
|
||
|
|
||
| class UpgradeProgressPublisher: |
| ) | ||
|
|
||
|
|
||
| class BatchUpgradeProgressPublisher: |
| coverage run runtests.py --tag=no_parallel | ||
| coverage run runtests.py --parallel --exclude-tag=no_parallel |
There was a problem hiding this comment.
could you please add a comment explaining why we need to run the test suite with the no_parallel test tag here? (I know it's for real-time WebSocket tests, but a comment would help remind us that this flag is required) 🙏



Checklist
Description
Introduced real-time display of upgrade progress for individual upgrade operations in the UI. Users will now be able to see live feedback and status updates as firmware upgrades are performed, improving transparency and user experience during the upgrade process.
Also changed the Mass Upgrade page view from a long list of upgrades to a table including filters based on Status of the upgrade and Organization
Screenshots
Real-time Progress bar
Mass Upgrade device table and filters
Reference to Existing Issue
Closes #224.