I have noticed in our infra that the statusProber package is somewhat inconsistent.
func (p *GRPCProbe) Ready() {
p.h.SetServingStatus("", grpc_health.HealthCheckResponse_SERVING)
}
// NotReady sets components status to not ready with given error as a cause.
func (p *GRPCProbe) NotReady(err error) {
p.h.SetServingStatus("", grpc_health.HealthCheckResponse_NOT_SERVING)
}
// Healthy sets components status to healthy.
func (p *GRPCProbe) Healthy() {
p.h.Resume()
} which is basically
func (s *Server) Resume() {
s.mu.Lock()
defer s.mu.Unlock()
s.shutdown = false
for service := range s.statusMap {
s.setServingStatusLocked(service, healthpb.HealthCheckResponse_SERVING)
}
}
// NotHealthy sets components status to not healthy with given error as a cause.
func (p *GRPCProbe) NotHealthy(err error) {
p.h.Shutdown()
}
So when statusProber.Healthy() is called, we actually call an equivalent of grpcProbe.Ready() even though the gRPC server in many cases is not even created, i.e in cmd/thanos/store.go statusProber.Healthy() is called at line 310 (link)
while `statusProber.Ready() only at line 565 (link)
This was originally a PR (#8692), but I realized that this isn't that big of a bug, rather an inconsistency that we should try to fix.
I have noticed in our infra that the statusProber package is somewhat inconsistent.
So when statusProber.Healthy() is called, we actually call an equivalent of grpcProbe.Ready() even though the gRPC server in many cases is not even created, i.e in cmd/thanos/store.go statusProber.Healthy() is called at line 310 (link)
while `statusProber.Ready() only at line 565 (link)
This was originally a PR (#8692), but I realized that this isn't that big of a bug, rather an inconsistency that we should try to fix.