Skip to content

Evaluate Guardian from a Design for Failure Lens #6008

@guardian-automation

Description

@guardian-automation

When a system service or dependency is unavailable or disrupted (e.g. mainnet), I want to understand why its not available, when it is expected to resume to normal, and monitor changes so that I can plan accordingly.

Acceptance Criteria

  • Develop one failure scenario and implement end-to-end with DFF principles
  • identify / list failure scenarios related to external services and dependencies that would cause disruptions (e.g. testnet/mainnet down, IPFS, network outage etc)
  • display a global notification to user that is as specific as practical (e.g. Some external services are offline, check and monitor status here)
  • test and demonstrate how it works for major services

Considerations and Room for Improvement

Consider the following to improved further,

Design for Failure (DFF) Principles

  • redundancy & failover to eliminate single points of failure
  • graceful degradation & fallbacks
  • circuit breakers / timeouts to prevent cascading failures
  • safe defaults
  • fast failure
  • observability / monitoring

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions