Skip to content
This repository was archived by the owner on Feb 1, 2021. It is now read-only.
This repository was archived by the owner on Feb 1, 2021. It is now read-only.

Container Rebalancing #599

@aluzzardi

Description

@aluzzardi

In case of node failure, containers should be rescheduled automatically by swarm.

Rebalancing means re-creating the container on another healthy node that fits the original constraints, affinities and requirements of such container.

  • Only select containers should be re-scheduled. This strictly includes stateless containers with the go-ahead of the operator
    • Containers with volumes or any other explicit state should not be rescheduled
    • Users should be able to instruct swarm when creating a container whether it can be rescheduled or nor (-e reschedule:[always,never]?)
  • Handle node resurrection
    • What happens when a failed node comes back to life? If the node has containers that have been rescheduled, we will end up with duplicates. Think of a stateless service that sends daily reports by e-mail: those messages will end up being sent twice. Ideas:
      • Swarm should kill duplicates (keep the last scheduled version)
      • Swarm should rewrite the restart policy of the container to never and start them manually (based on the restart policy). This way, duplicates would never be automatically started before being destroyed.
  • Container IDs
    • Re-scheduling containers means they will change ID. In order to have a consistent ID exposed to the user, we need Virtual IDs. See Virtual Container IDs #600

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions