-
Notifications
You must be signed in to change notification settings - Fork 541
extract bounding box from NetCDF/HDF5 files and insert into the geospatial metadata block #9541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
de0c421
1ee2ea7
09490f0
f5be4a8
27e60ef
1fb2798
189c774
2edb351
690b8a4
719439d
ed3d4d9
85d4bf2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| An attempt will be made to extract a geospatial bounding box (west, south, east, north) from NetCDF and HDF5 files and then insert these values into the geospatial metadata block, if enabled. |
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| package edu.harvard.iq.dataverse.ingest.metadataextraction.impl.plugins.netcdf; | ||
|
|
||
| import edu.harvard.iq.dataverse.DatasetFieldConstant; | ||
| import edu.harvard.iq.dataverse.ingest.metadataextraction.FileMetadataExtractor; | ||
| import edu.harvard.iq.dataverse.ingest.metadataextraction.FileMetadataIngest; | ||
| import edu.harvard.iq.dataverse.ingest.metadataextraction.spi.FileMetadataExtractorSpi; | ||
| import java.io.BufferedInputStream; | ||
| import java.io.File; | ||
| import java.io.IOException; | ||
| import java.util.HashMap; | ||
| import java.util.HashSet; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
| import java.util.logging.Logger; | ||
| import ucar.ma2.DataType; | ||
| import ucar.nc2.Attribute; | ||
| import ucar.nc2.NetcdfFile; | ||
| import ucar.nc2.NetcdfFiles; | ||
|
|
||
| public class NetcdfFileMetadataExtractor extends FileMetadataExtractor { | ||
|
|
||
| private static final Logger logger = Logger.getLogger(NetcdfFileMetadataExtractor.class.getCanonicalName()); | ||
|
|
||
| public static final String WEST_LONGITUDE_KEY = "geospatial_lon_min"; | ||
| public static final String EAST_LONGITUDE_KEY = "geospatial_lon_max"; | ||
| public static final String NORTH_LATITUDE_KEY = "geospatial_lat_max"; | ||
| public static final String SOUTH_LATITUDE_KEY = "geospatial_lat_min"; | ||
|
|
||
| private static final String GEOSPATIAL_BLOCK_NAME = "geospatial"; | ||
| private static final String WEST_LONGITUDE = DatasetFieldConstant.westLongitude; | ||
| private static final String EAST_LONGITUDE = DatasetFieldConstant.eastLongitude; | ||
| private static final String NORTH_LATITUDE = DatasetFieldConstant.northLatitude; | ||
| private static final String SOUTH_LATITUDE = DatasetFieldConstant.southLatitude; | ||
|
|
||
| public NetcdfFileMetadataExtractor(FileMetadataExtractorSpi originatingProvider) { | ||
| super(originatingProvider); | ||
| } | ||
|
|
||
| public NetcdfFileMetadataExtractor() { | ||
| super(null); | ||
| } | ||
|
|
||
| @Override | ||
| public FileMetadataIngest ingest(BufferedInputStream stream) throws IOException { | ||
| throw new UnsupportedOperationException("Not supported yet."); | ||
| } | ||
|
|
||
| public FileMetadataIngest ingestFile(File file) throws IOException { | ||
| FileMetadataIngest fileMetadataIngest = new FileMetadataIngest(); | ||
| fileMetadataIngest.setMetadataBlockName(GEOSPATIAL_BLOCK_NAME); | ||
|
|
||
| Map<String, String> geoFields = parseGeospatial(getNetcdfFile(file)); | ||
| WestAndEastLongitude welong = getStandardLongitude(new WestAndEastLongitude(geoFields.get(WEST_LONGITUDE), geoFields.get(EAST_LONGITUDE))); | ||
| String westLongitudeFinal = welong != null ? welong.getWestLongitude() : null; | ||
| String eastLongitudeFinal = welong != null ? welong.getEastLongitude() : null; | ||
| String northLatitudeFinal = geoFields.get(NORTH_LATITUDE); | ||
| String southLatitudeFinal = geoFields.get(SOUTH_LATITUDE); | ||
|
|
||
| logger.info(getLineStringsUrl(westLongitudeFinal, southLatitudeFinal, eastLongitudeFinal, northLatitudeFinal)); | ||
|
|
||
| Map<String, Set<String>> metadataMap = new HashMap<>(); | ||
| metadataMap.put(WEST_LONGITUDE, new HashSet<>()); | ||
| metadataMap.get(WEST_LONGITUDE).add(westLongitudeFinal); | ||
| metadataMap.put(EAST_LONGITUDE, new HashSet<>()); | ||
| metadataMap.get(EAST_LONGITUDE).add(eastLongitudeFinal); | ||
| metadataMap.put(NORTH_LATITUDE, new HashSet<>()); | ||
| metadataMap.get(NORTH_LATITUDE).add(northLatitudeFinal); | ||
| metadataMap.put(SOUTH_LATITUDE, new HashSet<>()); | ||
| metadataMap.get(SOUTH_LATITUDE).add(southLatitudeFinal); | ||
| fileMetadataIngest.setMetadataMap(metadataMap); | ||
| return fileMetadataIngest; | ||
| } | ||
|
|
||
| public NetcdfFile getNetcdfFile(File file) throws IOException { | ||
| /** | ||
| * <attribute name="geospatial_lat_min" value="25.066666666666666" /> | ||
| * south | ||
| * <attribute name="geospatial_lat_max" value="49.40000000000000" /> | ||
| * north | ||
| * <attribute name="geospatial_lon_min" value="-124.7666666333333" /> | ||
| * west | ||
| * <attribute name="geospatial_lon_max" value="-67.058333300000015" /> | ||
| * east | ||
| * <attribute name="geospatial_lon_resolution" value="0.041666666666666" /> | ||
| * <attribute name="geospatial_lat_resolution" value="0.041666666666666" /> | ||
| * <attribute name="geospatial_lat_units" value="decimal_degrees north" /> | ||
| * <attribute name="geospatial_lon_units" value="decimal_degrees east" /> | ||
| */ | ||
| return NetcdfFiles.open(file.getAbsolutePath()); | ||
| } | ||
|
|
||
| private Map<String, String> parseGeospatial(NetcdfFile netcdfFile) { | ||
| Map<String, String> geoFields = new HashMap<>(); | ||
|
|
||
| Attribute westLongitude = netcdfFile.findGlobalAttribute(WEST_LONGITUDE_KEY); | ||
| Attribute eastLongitude = netcdfFile.findGlobalAttribute(EAST_LONGITUDE_KEY); | ||
| Attribute northLatitude = netcdfFile.findGlobalAttribute(NORTH_LATITUDE_KEY); | ||
| Attribute southLatitude = netcdfFile.findGlobalAttribute(SOUTH_LATITUDE_KEY); | ||
|
|
||
| geoFields.put(DatasetFieldConstant.westLongitude, getValue(westLongitude)); | ||
| geoFields.put(DatasetFieldConstant.eastLongitude, getValue(eastLongitude)); | ||
| geoFields.put(DatasetFieldConstant.northLatitude, getValue(northLatitude)); | ||
| geoFields.put(DatasetFieldConstant.southLatitude, getValue(southLatitude)); | ||
|
|
||
| logger.info(getLineStringsUrl( | ||
| geoFields.get(DatasetFieldConstant.westLongitude), | ||
| geoFields.get(DatasetFieldConstant.southLatitude), | ||
| geoFields.get(DatasetFieldConstant.eastLongitude), | ||
| geoFields.get(DatasetFieldConstant.northLatitude))); | ||
|
|
||
| return geoFields; | ||
| } | ||
|
|
||
| // We store strings in the database. | ||
| private String getValue(Attribute attribute) { | ||
| if (attribute == null) { | ||
| return null; | ||
| } | ||
| DataType dataType = attribute.getDataType(); | ||
| if (dataType.isString()) { | ||
| return attribute.getStringValue(); | ||
| } else if (dataType.isNumeric()) { | ||
| return attribute.getNumericValue().toString(); | ||
| } else { | ||
| return null; | ||
| } | ||
| } | ||
|
|
||
| // Convert to standard -180 to 180 range by subtracting 360 | ||
| // if either longitude is greater than 180. | ||
| // west south east north | ||
| // 343.68, 41.8, 353.78, 49.62 becomes | ||
| // -16.320007, 41.8, -6.220001, 49.62 instead | ||
| // "If one of them is > 180, the domain is 0:360. | ||
| // If one of them is <0, the domain is -180:180. | ||
| // If both are between 0 and 180, the answer is indeterminate." | ||
| // https://github.com/cf-convention/cf-conventions/issues/435#issuecomment-1505614364 | ||
| public WestAndEastLongitude getStandardLongitude(WestAndEastLongitude westAndEastLongitude) { | ||
| if (westAndEastLongitude == null) { | ||
| return null; | ||
| } | ||
| if (westAndEastLongitude.getWestLongitude() == null || westAndEastLongitude.getEastLongitude() == null) { | ||
| return null; | ||
| } | ||
| float eastAsFloat; | ||
| float westAsFloat; | ||
| try { | ||
| westAsFloat = Float.valueOf(westAndEastLongitude.getWestLongitude()); | ||
| eastAsFloat = Float.valueOf(westAndEastLongitude.getEastLongitude()); | ||
| } catch (NumberFormatException ex) { | ||
| return null; | ||
| } | ||
| if (westAsFloat > 180 || eastAsFloat > 180) { | ||
| Float westStandard = westAsFloat - 360; | ||
| Float eastStandard = eastAsFloat - 360; | ||
| WestAndEastLongitude updatedWeLong = new WestAndEastLongitude(westStandard.toString(), eastStandard.toString()); | ||
| return updatedWeLong; | ||
| } | ||
| return westAndEastLongitude; | ||
| } | ||
|
|
||
| // Generates a handy link to see what the bounding box looks like on a map | ||
| private String getLineStringsUrl(String west, String south, String east, String north) { | ||
| // BBOX (Left (LON) ,Bottom (LAT), Right (LON), Top (LAT), comma separated, with or without decimal point): | ||
| return "https://linestrings.com/bbox/#" + west + "," + south + "," + east + "," + north; | ||
| } | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| package edu.harvard.iq.dataverse.ingest.metadataextraction.impl.plugins.netcdf; | ||
|
|
||
| import java.util.Objects; | ||
|
|
||
| public class WestAndEastLongitude { | ||
|
|
||
| private final String westLongitude; | ||
| private final String eastLongitude; | ||
|
|
||
| public WestAndEastLongitude(String westLongitude, String eastLongitude) { | ||
| this.westLongitude = westLongitude; | ||
| this.eastLongitude = eastLongitude; | ||
| } | ||
|
|
||
| public String getWestLongitude() { | ||
| return westLongitude; | ||
| } | ||
|
|
||
| public String getEastLongitude() { | ||
| return eastLongitude; | ||
| } | ||
|
|
||
| @Override | ||
| public String toString() { | ||
| return "WestAndEastLongitude{" + "westLongitude=" + westLongitude + ", eastLongitude=" + eastLongitude + '}'; | ||
| } | ||
|
|
||
| @Override | ||
| public int hashCode() { | ||
| int hash = 3; | ||
| return hash; | ||
| } | ||
|
|
||
| @Override | ||
| public boolean equals(Object obj) { | ||
| if (this == obj) { | ||
| return true; | ||
| } | ||
| if (obj == null) { | ||
| return false; | ||
| } | ||
| if (getClass() != obj.getClass()) { | ||
| return false; | ||
| } | ||
| final WestAndEastLongitude other = (WestAndEastLongitude) obj; | ||
| if (!Objects.equals(this.westLongitude, other.westLongitude)) { | ||
| return false; | ||
| } | ||
| return Objects.equals(this.eastLongitude, other.eastLongitude); | ||
| } | ||
|
|
||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| package edu.harvard.iq.dataverse.ingest.metadataextraction.impl.plugins.netcdf; | ||
|
|
||
| import edu.harvard.iq.dataverse.DatasetFieldConstant; | ||
| import edu.harvard.iq.dataverse.ingest.metadataextraction.FileMetadataIngest; | ||
| import java.io.File; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
| import static org.junit.Assert.assertNull; | ||
| import static org.junit.jupiter.api.Assertions.assertEquals; | ||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| public class NetcdfFileMetadataExtractorTest { | ||
|
|
||
| /** | ||
| * Expect some lat/long values (geospatial bounding box) with longtude | ||
| * values that have been transformed from a domain of 0 to 360 to a domain | ||
| * of -180 to 180. | ||
| */ | ||
| @Test | ||
| public void testExtractLatLong() throws Exception { | ||
| String pathAndFile = "src/test/resources/netcdf/ICOADS_R3.0.0_1662-10.nc"; | ||
| File file = new File(pathAndFile); | ||
| NetcdfFileMetadataExtractor instance = new NetcdfFileMetadataExtractor(); | ||
| FileMetadataIngest netcdfMetadata = instance.ingestFile(file); | ||
| Map<String, Set<String>> map = netcdfMetadata.getMetadataMap(); | ||
| assertEquals("-16.320007", map.get(DatasetFieldConstant.westLongitude).toArray()[0]); | ||
| assertEquals("-6.220001", map.get(DatasetFieldConstant.eastLongitude).toArray()[0]); | ||
| assertEquals("41.8", map.get(DatasetFieldConstant.southLatitude).toArray()[0]); | ||
| assertEquals("49.62", map.get(DatasetFieldConstant.northLatitude).toArray()[0]); | ||
| } | ||
|
|
||
| /** | ||
| * The NetCDF file under test doesn't have values for latitude/longitude | ||
| * (geospatial bounding box). | ||
| */ | ||
| @Test | ||
| public void testExtractNoLatLong() throws Exception { | ||
| String pathAndFile = "src/test/resources/netcdf/madis-raob"; | ||
| File file = new File(pathAndFile); | ||
| NetcdfFileMetadataExtractor instance = new NetcdfFileMetadataExtractor(); | ||
| FileMetadataIngest netcdfMetadata = null; | ||
| netcdfMetadata = instance.ingestFile(file); | ||
| Map<String, Set<String>> map = netcdfMetadata.getMetadataMap(); | ||
| assertNull(map.get(DatasetFieldConstant.westLongitude).toArray()[0]); | ||
| assertNull(map.get(DatasetFieldConstant.eastLongitude).toArray()[0]); | ||
| assertNull(map.get(DatasetFieldConstant.southLatitude).toArray()[0]); | ||
| assertNull(map.get(DatasetFieldConstant.northLatitude).toArray()[0]); | ||
| } | ||
|
|
||
| @Test | ||
| public void testStandardLongitude() { | ||
| NetcdfFileMetadataExtractor extractor = new NetcdfFileMetadataExtractor(); | ||
| // Both are over 180. Change. | ||
| assertEquals(new WestAndEastLongitude("-16.320007", "-6.220001"), extractor.getStandardLongitude(new WestAndEastLongitude("343.68", "353.78"))); | ||
| // One is over 180. Change | ||
| assertEquals(new WestAndEastLongitude("-260.0", "-179.0"), extractor.getStandardLongitude(new WestAndEastLongitude("100", "181"))); | ||
| // Both are under 180. No change. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JR-1991 it seems to me that the test above should fail as the result of -260 is outside the range of -180 - 180.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree. I'll ping Jan for some input on this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pdurbin just for clarification, the right hand-side extractor transforms by subtracting 360° from both? If so, then I have to agree, this test should fail and truly evaluate to Given the proposed change in the calculation, the test should look like this: assertEquals(
new WestAndEastLongitude("100.0", "-179.0"),
extractor.getStandardLongitude(new WestAndEastLongitude("100", "181"))
);What do you think?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's late on a Friday but I powered through and pushed some fixes around longitude at 2edb351. Almost as importantly, I added docs! Rules in the User Guide about how users (and QA) should expect the code to behave. I'd love to hear what you (@JR-1991) and @atrisovic and @sekmiller (whom I've been bugging all day) think! Here they are:
I also added more tests. I'm happy to add more if you all think of more edge cases. I don't have real NetCDF files to test with that have wacky values. It might be a challenge for QA to find NetCDF files to exercise all the rules. |
||
| assertEquals(new WestAndEastLongitude("25", "35"), extractor.getStandardLongitude(new WestAndEastLongitude("25", "35"))); | ||
| assertEquals(null, extractor.getStandardLongitude(new WestAndEastLongitude("foo", "bar"))); | ||
| } | ||
|
|
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we're converting 0-360 to -180-180 then we need to subtract just 180 from the given value, not 360.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sekmiller I was also puzzled with this at the beginning, but subtracting 360° is the correct procedure. The math only tells half the truth, because the prime meridian (0° longitude) needs to stay at Greenwich and thus the interval's "cycle point" where both ends meet changes by 180°. Thus, this has to be attributed for, which leads to the 360°.
That's how I understood it so far, but this illustration may explain it better:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JR-1991 thanks! Even with that explanation, it's hard to keep straight in my head.
I was relying on a tool to check that maps are the same. That is these two URLs (minus 360) show the same map:
Here's a screenshot of the maps (and a bit more commentary) I originally posted at https://dataverse.zulipchat.com/#narrow/stream/376593-geospatial/topic/extract.20lat.2Flong.20.239331/near/348542658
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JR-1991 OK. Looking at the globe and the example location makes sense when the value is between 180 and 360, but what the code does here is if the value for either is over 180 it subtracts 360 from both so there is a potential for a situation where one val is > 180 and the other is less and in that case, say 170, and subtracting 360 results in -190 which is off the scale below -180.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sekmiller thanks for the review! Indeed subtracting from both would lead to a failure. I tested it visually and in the case of$\theta_1 = [280°, 170°]$ the result should translate to $\theta_2 = [-80°, 170°]$ - Hence if a degree is under 180 it should not be subtracted. I have updated the illustration:
I understand it in this way, that within this domain transformation only one half of the circle is changing. Note, in 0° - 360° domain the first half is the same as in the domain of (-180°) - 180° and thus there is a 1:1 relation if degrees are within 0-180. Yet, the other half does change to another range in a more drastic way. Tipping over 180° by a degree suddenly leads you to -179° which in the other domain corresponds to 181° - Hence the 360° provides a valid transformation for this case.
It's quite complicated though 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should add to the figure, that$f(\theta)$ is defined for $\theta \in [0°, 360°]$ and if $\theta < 0°$ its already in the desired domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to address all this in 690b8a4 just now and left this comment about it: #9541 (comment)
Maybe I should have resolve other conversation and gotten it down to just this one. Oh well. Please see the other comment.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the thread and agree with the comments! Instead of the lines 154-182, we can check each coordinate by itself and it would work well, ie: