Jungle Rhythms: keeping score

For those keeping score, the project is 4 months in and just passed 1/3 mark of retired subjects and is rapidly approaching the half way point in terms of overall work done (number of classifications). Cheers, for everyone who contributed!!!

open access != easy access

Over the past few years, writing software left and right, I noticed a trend. Most of the software I write serves one purpose: making open access data - accessible! This should not be!

There has been a steady push for open data, increasing transparency and reproducibility of scientific reporting. Although many scientific data sources are indeed open, their access is not (easy), especially for the less computer savvy.

For example, NASA provides a wealth of open data. Although there are a few tools on which one can rely, non are user friendly. Luckily, most of those can be replaced by open source tools coded by data users (ModisTools, MODIS LSP, GDAL). The European Space Agency (ESA)  fares even worse, where their data access is more restrictive and accessing data is an equal or even bigger mess. However, ESA has seen a recent push for a more user centric experience on some of the projects.

Looking at the field of ecology some projects do well and maintain APIs such at Daymet and Tropicos. Although Tropicos requires a personal key which makes writing toolboxes cumbersome. The later left me with no other choice than to scrape the website. The Oak Ridge National Laboratories (ORNL) also offers MODIS land product subsets through an API, with interfaces coded by users. However, these data are truly open access (as in relatively easy to query) and should be considered the way forward.

This contrasts with for example resources such as The Plant List, which offers a wealth of botanical knowledge guarded behind a search box on a web page, only to be resolved by either downloading the whole database or by using a third party website. Similarly the National Snow and Ice Data Center oldest snow and ice data is stored in an incomprehensible format (the more recent data is offered in an accessible geotiff format). Surprisingly, even large projects such as Ameriflux, with a rather prominent internet presence, suffer the same fate, i.e. a wealth of data largely inaccessible for quick and easy use and review.

Pooling several data access issues in the above examples, I think I’ve illustrated that open access data does not equal easily accessible data. A good few of us write and maintain toolboxes to alleviate these problems for themselves and the community at large. However, these efforts take up valuable time and resources and can’t be academically remunerated as only a handful of tools would qualify as substantial enough to publish.

I therefore would plead that data producers (projects alike) to make their open data easily accessible by:

  1. creating proper APIs to access all data or metadata (for querying)
  2. making APIs truly open so writing plugins can be easy if you don't do it yourself
  3. writing toolboxes that do not rely on proprietary software (e.g. Matlab)
  4. assigning a true budget to these tasks
  5. appreciating those that develop tools on the side

AmeriFluxR: a R toolbox to facilitate Ameriflux Level2 data exploration

Today I launch the first version of AmerifluxR. The AmeriFluxR package is a R toolbox to facilitate easy Ameriflux Level2 data exploration and downloads through a convenient R shiny based GUI. This toolset was motivated by my need to quickly assess what data was available (metadata) and what the inter-annual variability in ecosystem fluxes looked like (true data).

The package provides a mapping interface to explore the distribution of the data (metadata). Subsets can be made geographically and/or by vegetation type. Summary statistics (# sites / # site years) are provided on top of the page. The Data Explorer tab allows for more in depth analysis of the true data (which is downloaded and merged into one convenient file on the fly). A snapshot of the initial Map and Site Selection landing page is shown below.

interface

 

In the Data Explorer tab one can plot ecosystem productivity data (GPP / NEE) for a selected site. You can select a plot displaying all data on a daily basis (consecutively) or overlaying data yearly. Note that although all sites are listed, not all of them have accessible data. The plot area will notify you of this.

[caption id=”attachment_1227” align=”aligncenter” width=”768”]daily_gpp GPP at Harvard Forest[/caption]

 

[caption id=”attachment_1228” align=”aligncenter” width=”768”]yearly_nee overlaying daily NEE values (together with the long term mean and standard deviation; LTM and SD respectively)[/caption]

The package can be conveniently installed using only 3 commands on the R terminal (the first line takes care of dependencies, the second line loads devtools which is required to install from a github repository, line 3).

install.packages(c("rvest","data.table","RCurl","DT","shiny","shinydashboard","leaflet","plotly","devtools"))
require(devtools)
install_github("khufkens/amerifluxr")

To get started, just type

ameriflux.explorer()

on the command line and the above screen will pop-up in your favourite browser (preferentially Chrome).

Future development will include higher level products as well as other metrics (yearly summaries, etc…). I welcome anyone to join this effort and potential scientific endeavours that spring from this. Drop me a line by email or on GitHub.

map colours onto data values in GDAL

This is a quick post originating from a discussion I had recently. Sometimes GIS data does not come with it’s original colour map but only as raw numbers. These raw numbers (classes) are fine for calculations, but rather limit the way you visualize things. Here, I’ll show how to map colours to the classes or ranges using the Geospatial Data Abstraction Library (GDAL).

All you need is a list of classes which you want to map to particular colours. The format of this colour table is rather flexible and is described in full on the GRASS r.colors page. For this particular example I used the colours of the 0.5 km MODIS-based Global Land Cover Climatology map, which translates into a table with 16 classes (I attached the table at the end of the blog post). You can download the data form the USGS website if you want to try this example (warning: large file - 4GB unzipped).

 

global_veg

# If the colour table is saved in colours.csv the following
# command links a proper colour table to a geotiff file
# without this information.
gdaldem color-relief input.tif colours.csv output.tif

The above command might map the colours to the classes but the map still remains rather static. If you want to create a Google Earth compatible file (mapped onto a 3D sphere), you can do so by translating the file format. The resulting KML file should open in Google Earth if you have a copy running.

gdal_translate -of KMLSUPEROVERLAY input.tif output.kml -co format=png

The full colour table for the image as linked to above is shown below.

0 186 254 253
1 0 100 1
2 77 165 87
3 125 204 15
4 98 232 101
5 55 198 132
6 214 119 117
7 253 237 160
8 185 231 141
9 255 224 27
10 254 192 107
11 37 138 220
12 252 251 0
13 251 3 4
14 147 144 5
15 254 220 211
16 191 191 191

 

 

Pagination


© 2018. All rights reserved.

Powered by Hydejack v7.5.1