Circuit breaker automatically disables failing conditions after reaching a failure threshold, preventing cascading failures and giving failing services time to recover.
The circuit breaker pattern protects your system by:
- Monitoring failures: Tracks failure rate over a sliding time window
- Opening the circuit: Stops calling failing condition when threshold exceeded
- Half-open state: Periodically tests if service recovered
- Closing the circuit: Resumes normal operation when service healthy
ExperimentFramework integrates with Polly for circuit breaker implementation.
dotnet add package ExperimentFramework.ResilienceThis package includes Polly and circuit breaker integration.
using ExperimentFramework.Resilience;
var experiments = ExperimentFrameworkBuilder.Create()
.Trial<IPaymentGateway>(t => t
.UsingFeatureFlag("UseNewPaymentGateway")
.AddControl<StableGateway>("false")
.AddCondition<NewGateway>("true")
.OnErrorRedirectAndReplayControl())
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.5; // Open after 50% failure rate
options.MinimumThroughput = 10; // Need 10 calls to assess
options.SamplingDuration = TimeSpan.FromSeconds(30);
options.BreakDuration = TimeSpan.FromSeconds(60);
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.UseDispatchProxy();
builder.Services.AddExperimentFramework(experiments);Percentage of failures (0.0-1.0) that triggers circuit opening.
options.FailureRatioThreshold = 0.3; // Open at 30% failure rateGuidelines:
- 0.1-0.2: Conservative, production-critical services
- 0.3-0.5: Balanced, most scenarios
- 0.6-0.8: Permissive, development/testing
Minimum number of requests before evaluating failure ratio.
options.MinimumThroughput = 20; // Need 20 requests before assessmentPrevents premature circuit opening from a few failures during low traffic.
Guidelines:
- 5-10: Low traffic services
- 10-20: Medium traffic
- 50+: High traffic services
Time window for tracking failures.
options.SamplingDuration = TimeSpan.FromMinutes(1);Guidelines:
- 10-30 seconds: Fast-changing conditions
- 1-2 minutes: Most scenarios
- 5+ minutes: Slowly varying loads
How long the circuit stays open before transitioning to half-open.
options.BreakDuration = TimeSpan.FromMinutes(2);During this time, all requests are immediately failed (or fallback to default).
Guidelines:
- 30-60 seconds: Fast recovery expected
- 2-5 minutes: Standard services
- 10+ minutes: Services with slow startup
What happens when circuit opens:
// Throw exception
options.OnCircuitOpen = CircuitBreakerAction.ThrowException;
// Fallback to control condition
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
// Fallback to specific condition
options.OnCircuitOpen = CircuitBreakerAction.FallbackToSpecificTrial;
options.FallbackTrialKey = "noop";All requests pass through normally. Failures are tracked.
Request → Trial → Response
All requests fail immediately or fallback. Service not called.
Request → ❌ Circuit Open → Fallback
After BreakDuration, allows test requests to check if service recovered.
Test Request → Trial → Success? → Close : Open
public interface IPaymentGateway
{
Task<PaymentResult> ProcessPaymentAsync(Payment payment);
}
public class StablePaymentGateway : IPaymentGateway
{
private readonly ILogger<StablePaymentGateway> _logger;
public StablePaymentGateway(ILogger<StablePaymentGateway> logger)
{
_logger = logger;
}
public async Task<PaymentResult> ProcessPaymentAsync(Payment payment)
{
_logger.LogInformation("Processing with stable gateway");
// Proven, reliable implementation
return await ProcessLegacyPayment(payment);
}
}
public class NewPaymentGateway : IPaymentGateway
{
private readonly HttpClient _http;
private readonly ILogger<NewPaymentGateway> _logger;
public NewPaymentGateway(HttpClient http, ILogger<NewPaymentGateway> logger)
{
_http = http;
_logger = logger;
}
public async Task<PaymentResult> ProcessPaymentAsync(Payment payment)
{
_logger.LogInformation("Processing with new gateway");
// New implementation - may have reliability issues
var response = await _http.PostAsJsonAsync("/process", payment);
response.EnsureSuccessStatusCode();
return await response.Content.ReadFromJsonAsync<PaymentResult>();
}
}
// Configuration
builder.Services.AddHttpClient<NewPaymentGateway>();
builder.Services.AddScoped<StablePaymentGateway>();
builder.Services.AddScoped<NewPaymentGateway>();
builder.Services.AddScoped<IPaymentGateway, StablePaymentGateway>();
var experiments = ExperimentFrameworkBuilder.Create()
.Trial<IPaymentGateway>(t => t
.UsingFeatureFlag("UseNewPaymentGateway")
.AddControl<StablePaymentGateway>("false")
.AddCondition<NewPaymentGateway>("true")
.OnErrorRedirectAndReplayControl())
.WithCircuitBreaker(options =>
{
// Conservative settings for payment processing
options.FailureRatioThreshold = 0.2; // Open at 20% failure
options.MinimumThroughput = 5; // Need only 5 calls
options.SamplingDuration = TimeSpan.FromMinutes(1);
options.BreakDuration = TimeSpan.FromMinutes(5);
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.UseDispatchProxy();
builder.Services.AddExperimentFramework(experiments);Scenario:
- New gateway starts failing (network issue, API down, etc.)
- After 5 requests with 20% failure rate, circuit opens
- All payment requests route to stable gateway for 5 minutes
- After 5 minutes, test request checks if new gateway recovered
- If recovered, circuit closes and normal operation resumes
Circuit breaker works great with timeout enforcement:
var experiments = ExperimentFrameworkBuilder.Create()
.Trial<IExternalService>(t => t
.UsingFeatureFlag("UseExternalApi")
.AddControl<CachedService>("false")
.AddCondition<ExternalApiService>("true")
.OnErrorRedirectAndReplayControl())
.WithTimeout(TimeSpan.FromSeconds(3), TimeoutAction.FallbackToDefault)
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.5;
options.MinimumThroughput = 10;
options.SamplingDuration = TimeSpan.FromSeconds(30);
options.BreakDuration = TimeSpan.FromMinutes(1);
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.UseDispatchProxy();Effect:
- Individual slow requests timeout after 3 seconds
- If 50% of requests fail/timeout within 30 seconds, circuit opens
- All requests route to cached service for 1 minute
- Prevents hammering slow/failing external API
Different circuit breaker settings per service:
// Critical payment service - conservative settings
var paymentExperiments = ExperimentFrameworkBuilder.Create()
.Trial<IPaymentGateway>(t => t
.UsingFeatureFlag("UseNewPayment")
.AddControl<StablePayment>("false")
.AddCondition<NewPayment>("true")
.OnErrorRedirectAndReplayControl())
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.1; // Very conservative
options.MinimumThroughput = 5;
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.UseDispatchProxy();
// Non-critical recommendations - permissive settings
var recommendationExperiments = ExperimentFrameworkBuilder.Create()
.Trial<IRecommendationEngine>(t => t
.UsingFeatureFlag("UseMachineLearning")
.AddControl<RuleBased>("false")
.AddCondition<MachineLearning>("true")
.OnErrorRedirectAndReplayControl())
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.6; // More permissive
options.MinimumThroughput = 20;
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.UseDispatchProxy();
builder.Services.AddExperimentFramework(paymentExperiments);
builder.Services.AddExperimentFramework(recommendationExperiments);Begin with strict thresholds and relax based on observed behavior:
options.FailureRatioThreshold = 0.2; // Start at 20%, increase if too sensitive
options.MinimumThroughput = 10; // Higher threshold for productionConsider how long service needs to recover:
// Database failover: 30-60 seconds
options.BreakDuration = TimeSpan.FromSeconds(60);
// External API with rate limiting: 5-10 minutes
options.BreakDuration = TimeSpan.FromMinutes(10);
// Microservice restart: 2-3 minutes
options.BreakDuration = TimeSpan.FromMinutes(3);Log circuit state changes:
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.5;
options.MinimumThroughput = 10;
options.SamplingDuration = TimeSpan.FromSeconds(30);
options.BreakDuration = TimeSpan.FromMinutes(1);
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.AddLogger(l => l.AddErrorLogging()) // Logs circuit breaker eventsWrite integration tests:
[Fact]
public async Task CircuitBreaker_OpensAfterFailures()
{
// Arrange
var experiments = ExperimentFrameworkBuilder.Create()
.Trial<IService>(t => t
.UsingFeatureFlag("UseNewService")
.AddControl<StableService>("false")
.AddCondition<FailingService>("true")
.OnErrorRedirectAndReplayControl())
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.5;
options.MinimumThroughput = 5;
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.UseDispatchProxy();
// Act - cause failures
for (int i = 0; i < 10; i++)
{
try
{
await service.ExecuteAsync();
}
catch { }
await Task.Delay(50);
}
// Assert - circuit opened, using stable service
var result = await service.ExecuteAsync();
Assert.Equal("Stable", result.Source);
}Combine with kill switch for manual control:
var killSwitch = new InMemoryKillSwitchProvider();
var experiments = ExperimentFrameworkBuilder.Create()
.Trial<IService>(t => t
.UsingFeatureFlag("UseNewService")
.AddControl<StableService>("false")
.AddCondition<NewService>("true")
.OnErrorRedirectAndReplayControl())
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.5;
options.MinimumThroughput = 10;
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.WithKillSwitch(killSwitch)
.UseDispatchProxy();
// Emergency disable if circuit breaker isn't fast enough
killSwitch.DisableTrial(typeof(IService), "true");Track circuit breaker state with metrics:
dotnet add package ExperimentFramework.Metrics.Exportersvar metrics = new PrometheusExperimentMetrics();
var experiments = ExperimentFrameworkBuilder.Create()
.Trial<IService>(t => t
.UsingFeatureFlag("UseNewService")
.AddControl<StableService>("false")
.AddCondition<NewService>("true")
.OnErrorRedirectAndReplayControl())
.WithCircuitBreaker(options =>
{
options.FailureRatioThreshold = 0.5;
options.MinimumThroughput = 10;
options.OnCircuitOpen = CircuitBreakerAction.FallbackToDefault;
})
.WithMetrics(metrics)
.UseDispatchProxy();
app.MapGet("/metrics", () => metrics.GeneratePrometheusOutput());Grafana queries:
# Circuit breaker open rate
sum(rate(experiment_errors_total{error="CircuitBreakerOpenException"}[5m])) by (service)
# Failure rate by condition
sum(rate(experiment_errors_total[5m])) by (service, trial_key)
/
sum(rate(experiment_invocations_total[5m])) by (service, trial_key)
Symptom: Circuit never opens despite failures.
Solutions:
- Check
MinimumThroughput- may not have enough requests - Verify
FailureRatioThresholdisn't too high - Ensure exceptions are being thrown (not swallowed)
- Check
SamplingDuration- window may be too short
Symptom: Circuit opens and closes repeatedly (flapping).
Solutions:
- Increase
FailureRatioThreshold(too strict) - Increase
MinimumThroughput(premature from small sample) - Increase
BreakDuration(not enough time to recover) - Check if condition has intermittent issues needing fixing
Symptom: Gets CircuitBreakerOpenException despite fallback configuration.
Solutions:
- Verify
OnCircuitOpen = CircuitBreakerAction.FallbackToDefault - Ensure
OnErrorRedirectAndReplayControl()is configured - Check that control condition is registered
- Timeout Enforcement - Prevent slow conditions
- Error Handling - Fallback strategies
- Kill Switch - Manual emergency shutdown
- Metrics - Monitor circuit breaker state