Standard Name (or other mechanism) to indicate that the variable value is a URL to other/external data #467
Replies: 5 comments 1 reply
-
|
Thanks for the comment, @DocOtak. Please could you explain a bit more, because I didn't understand it entirely. Do you mean you want to store the URL of the dataset in a variable ( |
Beta Was this translation helpful? Give feedback.
-
|
I had the same thought -- this seems like something to put in a global attribute. Though we may want to standardize something about that attribute -- a standard attribute name, for instance. Also -- any limitations on putting an array of strings in one attribute? that would make sense if there were more than one url. However, as I say that, if you have more that one url, you may want to provide more info about each one, which might call for a variable, and then you could have more information about each url. However again -- maybe we could standardize the form in each string, e.g.: Or is the idea that you want a URI that is specific not to that Dataset, but to a particular variable? In which case, I suppose the same scheme could be used, but put the attribute on a varaible. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @JonathanGregory and @ChrisBarker-NOAA There are a few reasons I have a strong desire to put the URI in the data part of a char or string dtype variable.
Even more specific than this, each data point in one of our variables is basically itself a whole other dataset with a unique URI. In this case it's genomic sequence data. IIRC I could link to either a specific dataset (i.e. the sequence from one niskin sample) or a collection of all the datasets this program has ever done, but not to e.g. a specific subset matches the data from my expedition. So my idea is to generalize this approach for our Profile DSG data (we are using the incomplete multidimensional array representation):
Only in the first case would storing in any sort of attribute make sense to me. |
Beta Was this translation helpful? Give feedback.
-
|
I think this is a highly interesting extension of the concept of standard names to discuss. From my previous work the different versions of the standard name table I recall that there are (were) some standard names that did not have a canonical unit, which may be used as an indication of a standard name for something else than a typical geophysical quantity. In the current version I think those either are deprecated or now have canonical unit
A key consideration is probably to find a standard name that reasonably well describes the intended content without neither becoming to generic so that it is used as a catch-all for any kind of external data, nor too restricted to only your specific use-case. |
Beta Was this translation helpful? Give feedback.
-
|
@DocOtak, thanks for further explanation. I hadn't appreciated before that you meant metadata that is specific to individual data values. I agree that you need a variable to contain a field of metadata, and it could be string-valued. I think that's fine. A variable of strings could get quite large. That's often the motivation for encoding it as integers using How do you associate the data variable with the variable containing URLs? The latter could be an auxiliary coordinate variable with all the dimensions of the data variable, but its purpose isn't really coordinates, it seems to me. It doesn't locate the data or provide some other geoscientific quantity on which the data depends. If the intention is per-cell metadata, I think it's an ancillary variable in CF terms. Cheers, Jonathan |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Topic for discussion
In the discrete sample data (niskin bottles) that my group manages, we have the situation where some data will never be included inside the data files themselves. This is often for programmatic reasons, i.e. agreement between participants that certain data will be in various other domain specific repositories. Quite often however, this lack of inclusion is due to the data themselves not fitting into a netCDF model well. Things like images from e.g. a FlowCam. Where this gets interesting from my group's data management perspective is that we often know exactly which sample (niskin) the external data came from. So we are able to be more specific when it comes to how to find the external datasets. I did some small tests with one of our files by inserting the URL of the landing page for the data files for that specific niskin bottle closure. When visualizing this test file in ODV, it would display a clickable URL in the sidebar for the exact bottle sample I had selected. This complexity is also why simply storing the URI in a variable attribute did not work.
Seeing how successful and powerful this was, my thoughts turned to what metadata should be attached to the variable itself to indicate that it contains URIs to external other datasets. Naturally my first idea was to ask for some standard name that indicates this e.g.
external_dataset_uri(not an actual proposal). While related, I didn't think of this as being appropriate forexternal_variablessince these are whole datasets I want to reference and they probably aren't even netCDF files. Though this external URI reference could maybe help improve the situation of "CF does not provide conventions for identifying the files concerned".I'm also starting a discussion rather than a specific proposal for a standard name as this didn't feel like it fit the purpose of the standard names, that is, I don't need to track a comparable quantity.
Beta Was this translation helpful? Give feedback.
All reactions