When a system service or dependency is unavailable or disrupted (e.g. mainnet), I want to understand why its not available, when it is expected to resume to normal, and monitor changes so that I can plan accordingly.
Acceptance Criteria
- Develop one failure scenario and implement end-to-end with DFF principles
- identify / list failure scenarios related to external services and dependencies that would cause disruptions (e.g. testnet/mainnet down, IPFS, network outage etc)
- display a global notification to user that is as specific as practical (e.g. Some external services are offline, check and monitor status here)
- test and demonstrate how it works for major services
Considerations and Room for Improvement
Consider the following to improved further,
Design for Failure (DFF) Principles
- redundancy & failover to eliminate single points of failure
- graceful degradation & fallbacks
- circuit breakers / timeouts to prevent cascading failures
- safe defaults
- fast failure
- observability / monitoring
When a system service or dependency is unavailable or disrupted (e.g. mainnet), I want to understand why its not available, when it is expected to resume to normal, and monitor changes so that I can plan accordingly.
Acceptance Criteria
Considerations and Room for Improvement
Consider the following to improved further,
Design for Failure (DFF) Principles