in Climate / Environment / Research / Science / Software on Climate, Code, Google earth engine, Research, Science, Software
For an ongoing project I had to calculate fractional snow cover transition dates across Alaska. These transition dates mark when snow levels drop below 5% coverage or when continuous accumulation of more than 5% starts. The period between these two dates can be considered snow free, and has the potential to support vegetation growth.
Below you see a snow cover trend map of the US and Canada, with earlier snow melt marked in red (-) and later snow melt marked in blue (+). Some patterns stand out, mainly mountain ranges in the West of the US have seen earlier and earlier snow melt dates. Alaska, doesn’t see very clear patterns, aside from earlier snow melt in the southern peninsula. Arctic Canada does see clear earlier snow melt throughout. Surprisingly, some later snow melt dates can be noted as well. From the middle of Alberta, Canada sweeping east to the great lakes a trend of later snow melt dates is noted. A word of caution is needed as these regions might be heavily influenced by the “polar vortex” in 2015, creating a long winter season, and potentially skewing the trend analysis. Furthermore, locations without data are places with a very short snow cover period, which were excluded for clarity (i.e. these are areas were snow cover has less of an influence on the growing season).
In short, as long as the analysis isn’t too complicated GEE is a quick way to get good and reproducible results. Sadly, the system does have it’s limitation and makes true time series analysis (with custom aggregation and curve fitting techniques) rather difficult if not impossible. I feel that these kinds of analysis are clearly outside the scope of the more GIS oriented GEE setup. I tip my hat to Google for providing this service, rather impressive I must say.
in R / Research / Science / Software on Data, Hdf, Modis, R, Research, Science, Software
The Moderate-resolution imaging spectroradiometer (MODIS) sensors on both Aqua and Terra platforms provide a wealth of (environmental) data. However, manipulating the raw files can be challenging. Although Google Earth Engine provides an easier way to access these data, as most of the MODIS products are hosted, sometimes direct manipulation is still necessary. Here I quickly outline how to extract data from raw MODIS files using R.
To download extract data from a MODIS HDF file in R you need a few packages. Mainly:
After installing these packages make sure you load them to continue.
Finding your way on the globe
First you have to determine where your data is located within the tiled format MODIS data is distributed in. All tiles are denoted with a horizontal (h) and a vertical index (v). For the Land Products there are roughly ~350 or so tiles with varying degrees of actual land coverage. All MODIS data are distributed in a custom sinusoidal projection, hence the rather weird looking shape of the map.
In order to find the right tile we load a coordinate into R. In this case it’s the location of Harvard Yard, or 42.375028, -71.116493 latitude and longitude respectively.
# the location of Harvard Yardharvard_yard=cbind(42.375028,-71.116493)# define the projection system to use# lat-long in this caselatlon=CRS('+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0')# create a spatial object from the coordinate(s)coordinates=SpatialPoints(harvard_yard,latlon)# download the overview of all the tiles in KMZdownload.file("https://daac.ornl.gov/MODIS/modis_sin.kmz",destfile="modis_sin.kmz")tiles=readOGR("./modis_sin.kmz","Features")# get the horizontal and vertical tile location for the coordinate(s)horizontal=extract(tiles,coordinates)$hvertical=extract(tiles,coordinates)$v
Once you have the horizontal and vertical tiles you can download only these tiles instead of the entire archive.
As previously mentioned, data is stored in HDF files. These files are readily readable by the GDAL library. However, given the structured nature of these files you have to know what you are looking for. These layers in structured HDF files are called scientific data sets (SDS). You need to specify one to read them individually.
# read in all SDS strings (layer descriptors)# using the MODIS packagesds<-getSds(filename)# grab layer 4 of a particular hdf filemy_hdf_layer=raster(readGDAL(sds$SDS4gdal,as.is=TRUE))# should the HDF file not contain any layers, or a singular one# you can default to using raster() any additional argumentsmy_hdf_file=raster(filename)
Now you have succesfully read in data. However, this data is projected using the sinusoidal projection, not the lat-long. In order to extract data at given locations we have to translate the coordinates of these extraction points to the MODIS sinusoidal projection.
# specify the sinusoidal projectionsinus=CRS("+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs")# transform the coordinatescoordinates_sinus=spTransform(coordinates,sinus)
Finally extract the data at the new sinusoidal coordinates.
in Jungle rhythms / Research / Science on Citizen science, Jungle rhythms, Research, Science
For those keeping score, the project is 4 months in and just passed 1/3 mark of retired subjects and is rapidly approaching the half way point in terms of overall work done (number of classifications). Cheers, for everyone who contributed!!!
in Op-ed / Research / Science / Software on Api, Code, Data, Open access, Open data, Research, Science, Software
Over the past few years, writing software left and right, I noticed a trend. Most of the software I write serves one purpose: making open access data - accessible! This should not be!
There has been a steady push for open data, increasing transparency and reproducibility of scientific reporting. Although many scientific data sources are indeed open, their access is not (easy), especially for the less computer savvy.
Looking at the field of ecology some projects do well and maintain APIs such at Daymet and Tropicos. Although Tropicos requires a personal key which makes writing toolboxes cumbersome. The later left me with no other choice than to scrape the website. The Oak Ridge National Laboratories (ORNL) also offers MODIS land product subsets through an API, with interfaces coded by users. However, these data are truly open access (as in relatively easy to query) and should be considered the way forward.
This contrasts with for example resources such as The Plant List, which offers a wealth of botanical knowledge guarded behind a search box on a web page, only to be resolved by either downloading the whole database or by using a third party website. Similarly the National Snow and Ice Data Center oldest snow and ice data is stored in an incomprehensible format (the more recent data is offered in an accessible geotiff format). Surprisingly, even large projects such as Ameriflux, with a rather prominent internetpresence, suffer the same fate, i.e. a wealth of data largely inaccessible for quick and easy use and review.
Pooling several data access issues in the above examples, I think I’ve illustrated that open access data does not equal easily accessible data. A good few of us write and maintain toolboxes to alleviate these problems for themselves and the community at large. However, these efforts take up valuable time and resources and can’t be academically remunerated as only a handful of tools would qualify as substantial enough to publish.
I therefore would plead that data producers (projects alike) to make their open data easily accessible by:
creating proper APIs to access all data or metadata (for querying)
making APIs truly open so writing plugins can be easy if you don't do it yourself
writing toolboxes that do not rely on proprietary software (e.g. Matlab)
in Research / Science / Software on R, Research, Science, Software
Today I launch the first version of AmerifluxR. The AmeriFluxR package is a R toolbox to facilitate easy Ameriflux Level2 data exploration and downloads through a convenient R shiny based GUI. This toolset was motivated by my need to quickly assess what data was available (metadata) and what the inter-annual variability in ecosystem fluxes looked like (true data).
The package provides a mapping interface to explore the distribution of the data (metadata). Subsets can be made geographically and/or by vegetation type. Summary statistics (# sites / # site years) are provided on top of the page. The Data Explorer tab allows for more in depth analysis of the true data (which is downloaded and merged into one convenient file on the fly). A snapshot of the initial Map and Site Selection landing page is shown below.
In the Data Explorer tab one can plot ecosystem productivity data (GPP / NEE) for a selected site. You can select a plot displaying all data on a daily basis (consecutively) or overlaying data yearly. Note that although all sites are listed, not all of them have accessible data. The plot area will notify you of this.
The package can be conveniently installed using only 3 commands on the R terminal (the first line takes care of dependencies, the second line loads devtools which is required to install from a github repository, line 3).
on the command line and the above screen will pop-up in your favourite browser (preferentially Chrome).
Future development will include higher level products as well as other metrics (yearly summaries, etc…). I welcome anyone to join this effort and potential scientific endeavours that spring from this. Drop me a line by email or on GitHub.