ENH #1881: Create the dataset read from the file, regardless of what …#1948
ENH #1881: Create the dataset read from the file, regardless of what …#1948doutriaux1 merged 4 commits intomasterfrom
Conversation
7e11476 to
604464b
Compare
|
@aashish24 @doutriaux1 Please review. Note the 190 modified baselines :-) |
|
the new baselines look beautiful. Could you also post a image from uvcdat 1.5.1 or earlier for comparison? (just pick one for validation) |
|
@danlipsa that actually worries me a bit, why aren't the new ones noisy? As I mentioned on my comments on the baselines PR I'm not sure if it's good or bad I just want to understand why it is so drastically different. The are prettier but I'm more concerned about them being right 😉 |
|
@doutriaux1 A way to look at this is: before this change = nearest neighbor interpolation - simpler, more noisy. after this change = linear interpolation - looks at all neighbors and averages their value based on the distance to the point, smoother. Given that we reconstruct a continuous signal out of points, neither is right :-) but I think the later has more properties we want and I would argue this is what people expect. Besides, it nicely simplifies the code as well. |
|
I can provide some insight from the modeling perspective. Unfortunately, the correct way to do interpolation will be highly dependent on the source data and how it is represented in the model. The atmospheric models I have worked on in the past represent data for most variables at the cell center (cell values in VTK parlance). The exception to this is for vector components such as wind velocity and fluxes. The numbers these represent are normal components to the cell's face positioned at the center. It is possible that cdms does some regridding to normalize the values that vcs receives, so it would be a good idea to look at what is done there. In terms of interpolation methods, the most correct method depends on the data type and ideally on how it is interpolated within the model. Categorical fields should not be interpolated bilinearly for instance. When regridding categorical fields are usually "accumulated" to the dominant value in the cell, but in this case nearest neighbor would probably be sufficient. Flux fields should be interpolated using a "mass preserving" interpolation method (the integrated flux should remain the same). Other fields often need to be interpolated within the model in a way that is globally differentiable. This is all to say that the modelers will have concerns beyond just what is the most visually pleasing. In my experience, if you are going to treat all of the data the same, it is best just to use nearest neighbor interpolation to stay true to the actual data in the file. |
|
@jbeezley thanks for the input that is coherent with the general feedback I'm getting so far. |
|
@jbeezley @doutriaux1 Some clarifications:
Good questions:
|
@danlipsa can you elaborate what you meant by "same grid"? Thanks, |
|
@aashish24 You can replace 'same grid' with 'same VTK dataset' to be more precise. After we generate the data we can apply the appropriate conversion for the plot we want. The current implementation, generates directly the converted data - but it uses a simplified conversion, that I think generates the artifacts we see. That also has the disadvantage that it uses two different dataset generation routines which leads to code duplication and opportunities for bugs. |
|
@doutriaux1 So, with @aashish24 we looked at ncdump output for clt.nc. That does not have bounds information, so you would think this would produce a dataset with information stored on points. Looking at the code, cdms2 generates bounds if they are not in stored in the file - see axis.py::_autobounds. We also generate bounds in vcs2vtk.py::getBoundsList. So, it seems that we think most datasets would have information stored at cell center, even if that is not specified in the file. Do you know why is that? |
604464b to
905502c
Compare
baddd0d to
92a5a12
Compare
Traditionally, we created a point or cell dataset based on the plot requested. For isofill, isoline and vector we created point datasets, for boxfill and meshfill we created cell datasets. We keep this behavior for backward compatibility but we add a parameter plot_based_dual_grid to plot(). If this parameter is missing or it is True, we have the traditional behavior. If this parameter is False, we create the dataset that is specified in the file, regardless of the plot requested.
With the new flag TransientAxis._genericBounds_ TransientAxis.getExplicitBounds returns None if the bounds were not read from a NetCDF file but were autogenerated. This creates a problem in cdutil.averager as bound axis are artificially extended to -90, 90 So, for a latbnds=[[90, 88], [88, 84], ..., [4, 0]] we extended it to [[90, 88], [88,84], ..., [4, -90]]. We remove the code that did this. This fix also improves the baseline for testEsmfRegridRegion.
92a5a12 to
0d6bfec
Compare
|
@doutriaux1 @aashish24 I added a plot_based_dual_grid=True option, that gives us the current behavior. Note also the baselines: |
|
@danlipsa great. the variable name "plot_based_dual_grid" is bit confusing. What is dual grid means here? |
|
@aashish24 I am open to suggestions for a name for that variable. :-). For the dual grid definition see the attached image: So, plot_based_dual_grid is meant to suggests that we create the grid from the nc file or the dual grid based on the plot we want to do. This is the current vcs behavior. |
|
thanks for the snapshot @danlipsa why its called dual though? Dual grid means something else to me. |
|
@aashish24 Not sure it is a common usage. I remember being using in this context when I did the cam5 catalyst adapter. There is a dual graph of a triangulation which is very similar with this - at least for cell to point direction. What does it mean to you? Also feel free to suggest a different word. |
|
@doutriaux1 @aashish24 Ready to merge? |
|
@danlipsa sorry didn't realize you were done with this one. Reviewing today. |
|
@doutriaux1 @aashish24 ping? |
|
oh yes, got distracted by @aashish24 PR but since it's not passing, let me do this one now. |
@doutriaux1 which PR you are referring to? |
|
@danlipsa some failures, re-running to post to dash board: |
|
@doutriaux1 uvcdat-testdata needed to be rebased as well. I pushed that. These are my failures, none seems related: |
|
@doutriaux1 Did you see my latest message about merging uvcdat-testdata? Can you rerun your tests with the new uvcdat-testdata? |
|
yep re-runnig the tests now |
|
@danlipsa https://open.cdash.org/viewTest.php?onlyfailed&buildid=4369973 Do you want me to add the missing baseline? |
| latbnds[0,0] = max(latbnds[0,0],+90.0) | ||
| latbnds[-1,1] = min(latbnds[-1,1],-90.0) | ||
|
|
||
| # Get longitude bounds |
There was a problem hiding this comment.
With the new _genericBounds_flag, I got a different average for a few tests.
The reason was that bounds were generated in grid.genBounds, and modified in the offending code which I removed, instead of axis.genGenericBounds. Also see the following changed baseline (first image).
https://github.com/UV-CDAT/uvcdat-testdata/pull/126/files
Image 4 does not have the purple lines which I think should not be there - all those images are different kind of interpolations.
There was a problem hiding this comment.
See also the comment from the commit.
|
|
@danlipsa I pushed the new file. Do you mind running the one test on your Linux box to see if it passes there as well or if we need a _1 file. |
|
@doutriaux1 Great. Thanks! I'll run the tests and report the results. |
|
@doutriaux1 test_vcs_no_continents passes on my machine. |


…the plot needs.
Previously, we created a point dataset if the plot needed one. The point dataset created
was incorrect, as it could use the cell centered values as point values. The grid was also
smaller - it used cell centers as points.