Skip to content

ENH #1881: Create the dataset read from the file, regardless of what …#1948

Merged
doutriaux1 merged 4 commits intomasterfrom
remove-gen-grid-point
May 17, 2016
Merged

ENH #1881: Create the dataset read from the file, regardless of what …#1948
doutriaux1 merged 4 commits intomasterfrom
remove-gen-grid-point

Conversation

@danlipsa
Copy link
Copy Markdown
Contributor

…the plot needs.

Previously, we created a point dataset if the plot needed one. The point dataset created
was incorrect, as it could use the cell centered values as point values. The grid was also
smaller - it used cell centers as points.

@danlipsa
Copy link
Copy Markdown
Contributor Author

CDAT/uvcdat-testdata#126

@danlipsa danlipsa force-pushed the remove-gen-grid-point branch from 7e11476 to 604464b Compare April 26, 2016 19:21
@danlipsa
Copy link
Copy Markdown
Contributor Author

@aashish24 @doutriaux1 Please review. Note the 190 modified baselines :-)

@aashish24
Copy link
Copy Markdown
Contributor

the new baselines look beautiful. Could you also post a image from uvcdat 1.5.1 or earlier for comparison? (just pick one for validation)

@danlipsa
Copy link
Copy Markdown
Contributor Author

Seems like old images are also noisy. This is for uvcdat 1.5.1
uvcdat-1 5 1

@doutriaux1
Copy link
Copy Markdown
Contributor

@danlipsa that actually worries me a bit, why aren't the new ones noisy? As I mentioned on my comments on the baselines PR I'm not sure if it's good or bad I just want to understand why it is so drastically different. The are prettier but I'm more concerned about them being right 😉

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 A way to look at this is: before this change = nearest neighbor interpolation - simpler, more noisy. after this change = linear interpolation - looks at all neighbors and averages their value based on the distance to the point, smoother. Given that we reconstruct a continuous signal out of points, neither is right :-) but I think the later has more properties we want and I would argue this is what people expect. Besides, it nicely simplifies the code as well.

@jbeezley
Copy link
Copy Markdown
Contributor

I can provide some insight from the modeling perspective. Unfortunately, the correct way to do interpolation will be highly dependent on the source data and how it is represented in the model.

The atmospheric models I have worked on in the past represent data for most variables at the cell center (cell values in VTK parlance). The exception to this is for vector components such as wind velocity and fluxes. The numbers these represent are normal components to the cell's face positioned at the center. It is possible that cdms does some regridding to normalize the values that vcs receives, so it would be a good idea to look at what is done there.

In terms of interpolation methods, the most correct method depends on the data type and ideally on how it is interpolated within the model. Categorical fields should not be interpolated bilinearly for instance. When regridding categorical fields are usually "accumulated" to the dominant value in the cell, but in this case nearest neighbor would probably be sufficient. Flux fields should be interpolated using a "mass preserving" interpolation method (the integrated flux should remain the same). Other fields often need to be interpolated within the model in a way that is globally differentiable.

This is all to say that the modelers will have concerns beyond just what is the most visually pleasing. In my experience, if you are going to treat all of the data the same, it is best just to use nearest neighbor interpolation to stay true to the actual data in the file.

@doutriaux1
Copy link
Copy Markdown
Contributor

@jbeezley thanks for the input that is coherent with the general feedback I'm getting so far.

@danlipsa
Copy link
Copy Markdown
Contributor Author

@jbeezley @doutriaux1 Some clarifications:

  1. There is no actual nearest neighbor interpolation in the code. This was my way of simplifying things. Cell centers are used as points when we generate the grid for plots that need point data. This way we get a slightly smaller grid than what is in the nc file. We do generate the right grid when we do plots that need cell data. I think we should generate the same grid regardless of what plot we need and this is what this change does.
  2. We use linear interpolation either way to generate the isofill/isoline.

Good questions:

  1. What kind of interpolation should we use? Nearest neighbor produces bad artifacts for many vis algorithms and it is not differentiable as you have discontinuities when you cross neighbor boundaries.
  2. Is the high frequency signal we see in the images before the change features of the data or artefacts we generate because we use a different grid than what is specified in the data.

@aashish24
Copy link
Copy Markdown
Contributor

  1. There is no actual nearest neighbor interpolation in the code. This was my way of simplifying things. Cell centers are used as points when we generate the grid for plots that need point data. This way we get a slightly smaller grid than what is in the nc file. We do generate the right grid when we do plots that need cell data. I think we should generate the same grid regardless of what plot we need and this is what this change does.

@danlipsa can you elaborate what you meant by "same grid"?

Thanks,

@danlipsa
Copy link
Copy Markdown
Contributor Author

@aashish24 You can replace 'same grid' with 'same VTK dataset' to be more precise. After we generate the data we can apply the appropriate conversion for the plot we want. The current implementation, generates directly the converted data - but it uses a simplified conversion, that I think generates the artifacts we see. That also has the disadvantage that it uses two different dataset generation routines which leads to code duplication and opportunities for bugs.

@aashish24
Copy link
Copy Markdown
Contributor

So @danlipsa and I talked and @danlipsa is going to look into the code some more before we have finally say on what grid type we need to generate which is logically correct.

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 So, with @aashish24 we looked at ncdump output for clt.nc. That does not have bounds information, so you would think this would produce a dataset with information stored on points. Looking at the code, cdms2 generates bounds if they are not in stored in the file - see axis.py::_autobounds. We also generate bounds in vcs2vtk.py::getBoundsList. So, it seems that we think most datasets would have information stored at cell center, even if that is not specified in the file. Do you know why is that?

@danlipsa danlipsa force-pushed the remove-gen-grid-point branch from 604464b to 905502c Compare April 29, 2016 15:38
@danlipsa danlipsa force-pushed the remove-gen-grid-point branch 2 times, most recently from baddd0d to 92a5a12 Compare May 9, 2016 20:58
danlipsa added 2 commits May 10, 2016 10:08
Traditionally, we created a point or cell dataset based on the plot requested.
For isofill, isoline and vector we created point datasets, for boxfill and meshfill we
created cell datasets. We keep this behavior for backward compatibility but we
add a parameter plot_based_dual_grid to plot(). If this parameter is missing or it is
True, we have the traditional behavior. If this parameter is False, we create the
dataset that is specified in the file, regardless of the plot requested.
With the new flag TransientAxis._genericBounds_ TransientAxis.getExplicitBounds
returns None if the bounds were not read from a NetCDF file but were autogenerated.
This creates a problem in cdutil.averager as bound axis are artificially extended to -90, 90
So, for a latbnds=[[90, 88], [88, 84], ..., [4, 0]] we extended it to
[[90, 88], [88,84], ..., [4, -90]]. We remove the code that did this.

This fix also improves the baseline for testEsmfRegridRegion.
@danlipsa danlipsa force-pushed the remove-gen-grid-point branch from 92a5a12 to 0d6bfec Compare May 10, 2016 14:08
@danlipsa
Copy link
Copy Markdown
Contributor Author

danlipsa commented May 10, 2016

@doutriaux1 @aashish24 I added a plot_based_dual_grid=True option, that gives us the current behavior.
We can also turn that option off and plot the grid that is given by the netcdf file. Here are the tests results.
Please review.

https://open.cdash.org/index.php?compare1=63&filtercount=2&field1=buildname%2Fstring&project=UV-CDAT&field2=buildstarttime%2Fdate&showfilters=0&limit=100&compare2=83&value1=0d6bfec7&showfeed=0&value2=20160510T101543

Note also the baselines:
CDAT/uvcdat-testdata#126

@aashish24
Copy link
Copy Markdown
Contributor

@danlipsa great. the variable name "plot_based_dual_grid" is bit confusing. What is dual grid means here?

@danlipsa
Copy link
Copy Markdown
Contributor Author

danlipsa commented May 11, 2016

@aashish24 I am open to suggestions for a name for that variable. :-). For the dual grid definition see the attached image:
20160511_090826

So, plot_based_dual_grid is meant to suggests that we create the grid from the nc file or the dual grid based on the plot we want to do. This is the current vcs behavior.

@aashish24
Copy link
Copy Markdown
Contributor

thanks for the snapshot @danlipsa why its called dual though? Dual grid means something else to me.

@danlipsa
Copy link
Copy Markdown
Contributor Author

danlipsa commented May 11, 2016

@aashish24 Not sure it is a common usage. I remember being using in this context when I did the cam5 catalyst adapter. There is a dual graph of a triangulation which is very similar with this - at least for cell to point direction.
https://graphics.stanford.edu/courses/cs268-09-winter/notes/handout5.pdf

What does it mean to you? Also feel free to suggest a different word.

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 @aashish24 Ready to merge?

@doutriaux1
Copy link
Copy Markdown
Contributor

@danlipsa sorry didn't realize you were done with this one. Reviewing today.

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 @aashish24 ping?

@doutriaux1
Copy link
Copy Markdown
Contributor

oh yes, got distracted by @aashish24 PR but since it's not passing, let me do this one now.

@aashish24
Copy link
Copy Markdown
Contributor

oh yes, got distracted by @aashish24 PR but since it's not passing, let me do this one now.

@doutriaux1 which PR you are referring to?

@doutriaux1
Copy link
Copy Markdown
Contributor

#1968

@doutriaux1
Copy link
Copy Markdown
Contributor

@danlipsa some failures, re-running to post to dash board:

    187 - test_vcs_basic_boxfill_zero (Failed)
    248 - test_vcs_basic_boxfill_aeqd_proj_SH (Failed)
    358 - test_vcs_boxfill_lev1_lev2 (Failed)
    401 - test_vcs_issue_960_labels (Failed)
    419 - vcs_test_fewer_colors_than_levels (Failed)
    430 - test_vcs_no_continents (Failed)

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 uvcdat-testdata needed to be rebased as well. I pushed that. These are my failures, none seems related:
54 - test_vcs_read_old_scr_2 (Failed)
65 - test_vcs_verify_boxfill_basics (Failed)
66 - test_vcs_verify_proj_basics (Failed)
67 - test_vcs_verify_fillarea_basics (Failed)
430 - test_vcs_no_continents (Failed)
562 - diags_test_02 (Failed)
563 - diags_test_03 (Failed)
564 - diags_test_04 (Failed)
565 - diags_test_41 (Failed)
566 - diags_test_05 (Failed)
567 - diags_test_06 (Failed)
568 - diags_test_07 (Failed)
569 - diags_test_08 (Failed)
570 - diags_test_09 (Failed)
571 - diags_test_10 (Failed)
572 - diags_test_11 (Failed)
573 - diags_test_12 (Failed)
574 - diags_test_13 (Failed)
575 - diags_test_15 (Failed)

@doutriaux1
Copy link
Copy Markdown
Contributor

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 Did you see my latest message about merging uvcdat-testdata? Can you rerun your tests with the new uvcdat-testdata?

@doutriaux1
Copy link
Copy Markdown
Contributor

yep re-runnig the tests now

@doutriaux1
Copy link
Copy Markdown
Contributor

@danlipsa https://open.cdash.org/viewTest.php?onlyfailed&buildid=4369973 Do you want me to add the missing baseline?

latbnds[0,0] = max(latbnds[0,0],+90.0)
latbnds[-1,1] = min(latbnds[-1,1],-90.0)

# Get longitude bounds
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danlipsa why did you remove this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new _genericBounds_flag, I got a different average for a few tests.
The reason was that bounds were generated in grid.genBounds, and modified in the offending code which I removed, instead of axis.genGenericBounds. Also see the following changed baseline (first image).
https://github.com/UV-CDAT/uvcdat-testdata/pull/126/files
Image 4 does not have the purple lines which I think should not be there - all those images are different kind of interpolations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also the comment from the commit.

@danlipsa
Copy link
Copy Markdown
Contributor Author

@danlipsa https://open.cdash.org/viewTest.php?onlyfailed&buildid=4369973 Do you want me to add the missing baseline?
@doutriaux1 That would be great! Thanks!

@doutriaux1
Copy link
Copy Markdown
Contributor

@danlipsa I pushed the new file. Do you mind running the one test on your Linux box to see if it passes there as well or if we need a _1 file.

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 Great. Thanks! I'll run the tests and report the results.

@danlipsa
Copy link
Copy Markdown
Contributor Author

@doutriaux1 test_vcs_no_continents passes on my machine.

@doutriaux1 doutriaux1 merged commit 1a83c8d into master May 17, 2016
@doutriaux1 doutriaux1 deleted the remove-gen-grid-point branch May 17, 2016 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants