ENH #1881: Create the dataset read from the file, regardless of what … by danlipsa · Pull Request #1948 · CDAT/cdat

danlipsa · 2016-04-25T15:53:03Z

…the plot needs.

Previously, we created a point dataset if the plot needed one. The point dataset created
was incorrect, as it could use the cell centered values as point values. The grid was also
smaller - it used cell centers as points.

danlipsa · 2016-04-25T15:54:43Z

CDAT/uvcdat-testdata#126

danlipsa · 2016-04-26T19:22:39Z

@aashish24 @doutriaux1 Please review. Note the 190 modified baselines :-)

aashish24 · 2016-04-27T02:28:07Z

the new baselines look beautiful. Could you also post a image from uvcdat 1.5.1 or earlier for comparison? (just pick one for validation)

danlipsa · 2016-04-27T15:27:49Z

Seems like old images are also noisy. This is for uvcdat 1.5.1

doutriaux1 · 2016-04-27T15:58:31Z

@danlipsa that actually worries me a bit, why aren't the new ones noisy? As I mentioned on my comments on the baselines PR I'm not sure if it's good or bad I just want to understand why it is so drastically different. The are prettier but I'm more concerned about them being right 😉

danlipsa · 2016-04-27T16:13:07Z

@doutriaux1 A way to look at this is: before this change = nearest neighbor interpolation - simpler, more noisy. after this change = linear interpolation - looks at all neighbors and averages their value based on the distance to the point, smoother. Given that we reconstruct a continuous signal out of points, neither is right :-) but I think the later has more properties we want and I would argue this is what people expect. Besides, it nicely simplifies the code as well.

jbeezley · 2016-04-27T17:26:28Z

I can provide some insight from the modeling perspective. Unfortunately, the correct way to do interpolation will be highly dependent on the source data and how it is represented in the model.

The atmospheric models I have worked on in the past represent data for most variables at the cell center (cell values in VTK parlance). The exception to this is for vector components such as wind velocity and fluxes. The numbers these represent are normal components to the cell's face positioned at the center. It is possible that cdms does some regridding to normalize the values that vcs receives, so it would be a good idea to look at what is done there.

In terms of interpolation methods, the most correct method depends on the data type and ideally on how it is interpolated within the model. Categorical fields should not be interpolated bilinearly for instance. When regridding categorical fields are usually "accumulated" to the dominant value in the cell, but in this case nearest neighbor would probably be sufficient. Flux fields should be interpolated using a "mass preserving" interpolation method (the integrated flux should remain the same). Other fields often need to be interpolated within the model in a way that is globally differentiable.

This is all to say that the modelers will have concerns beyond just what is the most visually pleasing. In my experience, if you are going to treat all of the data the same, it is best just to use nearest neighbor interpolation to stay true to the actual data in the file.

doutriaux1 · 2016-04-27T17:38:39Z

@jbeezley thanks for the input that is coherent with the general feedback I'm getting so far.

danlipsa · 2016-04-27T18:24:01Z

@jbeezley @doutriaux1 Some clarifications:

There is no actual nearest neighbor interpolation in the code. This was my way of simplifying things. Cell centers are used as points when we generate the grid for plots that need point data. This way we get a slightly smaller grid than what is in the nc file. We do generate the right grid when we do plots that need cell data. I think we should generate the same grid regardless of what plot we need and this is what this change does.
We use linear interpolation either way to generate the isofill/isoline.

Good questions:

What kind of interpolation should we use? Nearest neighbor produces bad artifacts for many vis algorithms and it is not differentiable as you have discontinuities when you cross neighbor boundaries.
Is the high frequency signal we see in the images before the change features of the data or artefacts we generate because we use a different grid than what is specified in the data.

aashish24 · 2016-04-27T22:42:36Z

There is no actual nearest neighbor interpolation in the code. This was my way of simplifying things. Cell centers are used as points when we generate the grid for plots that need point data. This way we get a slightly smaller grid than what is in the nc file. We do generate the right grid when we do plots that need cell data. I think we should generate the same grid regardless of what plot we need and this is what this change does.

@danlipsa can you elaborate what you meant by "same grid"?

Thanks,

danlipsa · 2016-04-28T01:26:21Z

@aashish24 You can replace 'same grid' with 'same VTK dataset' to be more precise. After we generate the data we can apply the appropriate conversion for the plot we want. The current implementation, generates directly the converted data - but it uses a simplified conversion, that I think generates the artifacts we see. That also has the disadvantage that it uses two different dataset generation routines which leads to code duplication and opportunities for bugs.

aashish24 · 2016-04-29T14:51:10Z

So @danlipsa and I talked and @danlipsa is going to look into the code some more before we have finally say on what grid type we need to generate which is logically correct.

danlipsa · 2016-04-29T14:59:27Z

@doutriaux1 So, with @aashish24 we looked at ncdump output for clt.nc. That does not have bounds information, so you would think this would produce a dataset with information stored on points. Looking at the code, cdms2 generates bounds if they are not in stored in the file - see axis.py::_autobounds. We also generate bounds in vcs2vtk.py::getBoundsList. So, it seems that we think most datasets would have information stored at cell center, even if that is not specified in the file. Do you know why is that?

Traditionally, we created a point or cell dataset based on the plot requested. For isofill, isoline and vector we created point datasets, for boxfill and meshfill we created cell datasets. We keep this behavior for backward compatibility but we add a parameter plot_based_dual_grid to plot(). If this parameter is missing or it is True, we have the traditional behavior. If this parameter is False, we create the dataset that is specified in the file, regardless of the plot requested.

With the new flag TransientAxis._genericBounds_ TransientAxis.getExplicitBounds returns None if the bounds were not read from a NetCDF file but were autogenerated. This creates a problem in cdutil.averager as bound axis are artificially extended to -90, 90 So, for a latbnds=[[90, 88], [88, 84], ..., [4, 0]] we extended it to [[90, 88], [88,84], ..., [4, -90]]. We remove the code that did this. This fix also improves the baseline for testEsmfRegridRegion.

danlipsa · 2016-05-10T15:03:25Z

@doutriaux1 @aashish24 I added a plot_based_dual_grid=True option, that gives us the current behavior.
We can also turn that option off and plot the grid that is given by the netcdf file. Here are the tests results.
Please review.

https://open.cdash.org/index.php?compare1=63&filtercount=2&field1=buildname%2Fstring&project=UV-CDAT&field2=buildstarttime%2Fdate&showfilters=0&limit=100&compare2=83&value1=0d6bfec7&showfeed=0&value2=20160510T101543

Note also the baselines:
CDAT/uvcdat-testdata#126

aashish24 · 2016-05-11T12:45:10Z

@danlipsa great. the variable name "plot_based_dual_grid" is bit confusing. What is dual grid means here?

danlipsa · 2016-05-11T13:12:47Z

@aashish24 I am open to suggestions for a name for that variable. :-). For the dual grid definition see the attached image:

So, plot_based_dual_grid is meant to suggests that we create the grid from the nc file or the dual grid based on the plot we want to do. This is the current vcs behavior.

aashish24 · 2016-05-11T13:48:06Z

thanks for the snapshot @danlipsa why its called dual though? Dual grid means something else to me.

danlipsa · 2016-05-11T14:06:45Z

@aashish24 Not sure it is a common usage. I remember being using in this context when I did the cam5 catalyst adapter. There is a dual graph of a triangulation which is very similar with this - at least for cell to point direction.
https://graphics.stanford.edu/courses/cs268-09-winter/notes/handout5.pdf

What does it mean to you? Also feel free to suggest a different word.

danlipsa · 2016-05-13T14:53:08Z

@doutriaux1 @aashish24 Ready to merge?

doutriaux1 · 2016-05-13T15:03:39Z

@danlipsa sorry didn't realize you were done with this one. Reviewing today.

danlipsa · 2016-05-16T18:57:52Z

@doutriaux1 @aashish24 ping?

doutriaux1 · 2016-05-16T21:04:42Z

oh yes, got distracted by @aashish24 PR but since it's not passing, let me do this one now.

aashish24 · 2016-05-16T22:40:13Z

oh yes, got distracted by @aashish24 PR but since it's not passing, let me do this one now.

@doutriaux1 which PR you are referring to?

doutriaux1 · 2016-05-16T22:53:25Z

#1968

doutriaux1 · 2016-05-16T23:57:48Z

@danlipsa some failures, re-running to post to dash board:

    187 - test_vcs_basic_boxfill_zero (Failed)
    248 - test_vcs_basic_boxfill_aeqd_proj_SH (Failed)
    358 - test_vcs_boxfill_lev1_lev2 (Failed)
    401 - test_vcs_issue_960_labels (Failed)
    419 - vcs_test_fewer_colors_than_levels (Failed)
    430 - test_vcs_no_continents (Failed)

danlipsa · 2016-05-17T13:48:43Z

@doutriaux1 uvcdat-testdata needed to be rebased as well. I pushed that. These are my failures, none seems related:
54 - test_vcs_read_old_scr_2 (Failed)
65 - test_vcs_verify_boxfill_basics (Failed)
66 - test_vcs_verify_proj_basics (Failed)
67 - test_vcs_verify_fillarea_basics (Failed)
430 - test_vcs_no_continents (Failed)
562 - diags_test_02 (Failed)
563 - diags_test_03 (Failed)
564 - diags_test_04 (Failed)
565 - diags_test_41 (Failed)
566 - diags_test_05 (Failed)
567 - diags_test_06 (Failed)
568 - diags_test_07 (Failed)
569 - diags_test_08 (Failed)
570 - diags_test_09 (Failed)
571 - diags_test_10 (Failed)
572 - diags_test_11 (Failed)
573 - diags_test_12 (Failed)
574 - diags_test_13 (Failed)
575 - diags_test_15 (Failed)

doutriaux1 · 2016-05-17T14:03:14Z

https://open.cdash.org/viewTest.php?onlyfailed&buildid=4368763

danlipsa · 2016-05-17T14:08:22Z

@doutriaux1 Did you see my latest message about merging uvcdat-testdata? Can you rerun your tests with the new uvcdat-testdata?

doutriaux1 · 2016-05-17T14:22:09Z

yep re-runnig the tests now

doutriaux1 · 2016-05-17T14:29:57Z

@danlipsa https://open.cdash.org/viewTest.php?onlyfailed&buildid=4369973 Do you want me to add the missing baseline?

doutriaux1 · 2016-05-17T14:35:56Z

-            latbnds[0,0] = max(latbnds[0,0],+90.0)
-            latbnds[-1,1] = min(latbnds[-1,1],-90.0)
-
        # Get longitude bounds


@danlipsa why did you remove this?

With the new _genericBounds_flag, I got a different average for a few tests.
The reason was that bounds were generated in grid.genBounds, and modified in the offending code which I removed, instead of axis.genGenericBounds. Also see the following changed baseline (first image).
https://github.com/UV-CDAT/uvcdat-testdata/pull/126/files
Image 4 does not have the purple lines which I think should not be there - all those images are different kind of interpolations.

Conversation

danlipsa commented Apr 25, 2016

Uh oh!

danlipsa commented Apr 25, 2016

Uh oh!

danlipsa commented Apr 26, 2016

Uh oh!

aashish24 commented Apr 27, 2016

Uh oh!

danlipsa commented Apr 27, 2016

Uh oh!

doutriaux1 commented Apr 27, 2016

Uh oh!

danlipsa commented Apr 27, 2016

Uh oh!

jbeezley commented Apr 27, 2016

Uh oh!

doutriaux1 commented Apr 27, 2016

Uh oh!

danlipsa commented Apr 27, 2016

Uh oh!

aashish24 commented Apr 27, 2016

Uh oh!

danlipsa commented Apr 28, 2016

Uh oh!

aashish24 commented Apr 29, 2016

Uh oh!

danlipsa commented Apr 29, 2016

Uh oh!

danlipsa commented May 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aashish24 commented May 11, 2016

Uh oh!

danlipsa commented May 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aashish24 commented May 11, 2016

Uh oh!

danlipsa commented May 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danlipsa commented May 13, 2016

Uh oh!

doutriaux1 commented May 13, 2016

Uh oh!

danlipsa commented May 16, 2016

Uh oh!

doutriaux1 commented May 16, 2016

Uh oh!

aashish24 commented May 16, 2016

Uh oh!

doutriaux1 commented May 16, 2016

Uh oh!

doutriaux1 commented May 16, 2016

Uh oh!

danlipsa commented May 17, 2016

Uh oh!

doutriaux1 commented May 17, 2016

Uh oh!

danlipsa commented May 17, 2016

Uh oh!

doutriaux1 commented May 17, 2016

Uh oh!

doutriaux1 commented May 17, 2016

Uh oh!

doutriaux1 May 17, 2016

Choose a reason for hiding this comment

Uh oh!

danlipsa May 17, 2016

Choose a reason for hiding this comment

Uh oh!

danlipsa May 17, 2016

Choose a reason for hiding this comment

Uh oh!

danlipsa commented May 17, 2016

Uh oh!

doutriaux1 commented May 17, 2016

Uh oh!

danlipsa commented May 17, 2016

danlipsa commented May 10, 2016 •

edited

Loading

danlipsa commented May 11, 2016 •

edited

Loading

danlipsa commented May 11, 2016 •

edited

Loading