You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 1, 2021. It is now read-only.
In case of node failure, containers should be rescheduled automatically by swarm.
Rebalancing means re-creating the container on another healthy node that fits the original constraints, affinities and requirements of such container.
Only select containers should be re-scheduled. This strictly includes stateless containers with the go-ahead of the operator
Containers with volumes or any other explicit state should not be rescheduled
Users should be able to instruct swarm when creating a container whether it can be rescheduled or nor (-e reschedule:[always,never]?)
Handle node resurrection
What happens when a failed node comes back to life? If the node has containers that have been rescheduled, we will end up with duplicates. Think of a stateless service that sends daily reports by e-mail: those messages will end up being sent twice. Ideas:
Swarm should kill duplicates (keep the last scheduled version)
Swarm should rewrite the restart policy of the container to never and start them manually (based on the restart policy). This way, duplicates would never be automatically started before being destroyed.
Container IDs
Re-scheduling containers means they will change ID. In order to have a consistent ID exposed to the user, we need Virtual IDs. See Virtual Container IDs #600