R arctic polar plots

For a project I needed to create an appealing plot of the arctic, showing the location of some field sites. I’ve posted this map on twitter earlier today. Below, I’ll outline a simple routine to recreate this plot, and if need be adjust it to your liking.

First of all I downloaded an appealing background from the Blue Marble dataset as created by NASA. Geotiffs can be downloaded here or by direct download following this link. Alternatively you can download the less realistic and more summary style graphics as produced by Natural Earth.

After downloading the Blue Marble geotiff you trim the data to the lowest latitude you want to plot. Subsequently, I reproject the data to the EPSG 3995 projection (or arctic polar stereographic). All this is done using GDAL. This step could be done using rgdal, but at times this doesn’t play nice. For now I post the command line GDAL code.

UPDATE: the below command line gdal code is not necessary anymore as I call the raster library in R now which works fine in dealing with the reprojection after some fiddling.

gdalwarp -te -180 55 180 90 world.topo.bathy.200407.3x5400x2700.tif tmp.tif
gdalwarp -wo SOURCE_EXTRA=200 -s_srs EPSG:4326 -t_srs EPSG:3995 -dstnodata "255 255 255" tmp.tif blue_marble.tif

The remaining R code ingests this background image and overlays a graticule and some labels. For this I heavily borrowed from the sp map gallery.

# load required libraries
library(sp)
library(maps)
library(rgeos)

# function to slice and dice a map and convert it to an sp() object
maps2sp = function(xlim, ylim, l.out = 100, clip = TRUE) {
  stopifnot(require(maps))
  m = map(xlim = xlim, ylim = ylim, plot = FALSE, fill = TRUE)
  p = rbind(cbind(xlim[1], seq(ylim[1],ylim[2],length.out = l.out)),
            cbind(seq(xlim[1],xlim[2],length.out = l.out),ylim[2]),
            cbind(xlim[2],seq(ylim[2],ylim[1],length.out = l.out)),
            cbind(seq(xlim[2],xlim[1],length.out = l.out),ylim[1]))
  LL = CRS("+init=epsg:4326")
  IDs = sapply(strsplit(m$names, ":"), function(x) x[1])
  stopifnot(require(maptools))
  m = map2SpatialPolygons(m, IDs=IDs, proj4string = LL)
  bb = SpatialPolygons(list(Polygons(list(Polygon(list(p))),"bb")), proj4string = LL)
  
  if (!clip)
    m
  else {
    stopifnot(require(rgeos))
    gIntersection(m, bb)
  }
}

# set colours for map grid
grid.col.light = rgb(0.5,0.5,0.5,0.8)
grid.col.dark = rgb(0.5,0.5,0.5)

# coordinate systems
polar = CRS("+init=epsg:3995")
longlat = CRS("+init=epsg:4326")

# download the blue marble data if it doesn't
# exist
if (!file.exists("blue_marble.tif")) {
download.file("http://neo.sci.gsfc.nasa.gov/servlet/RenderData?si=526312&cs=rgb&format=TIFF&width=5400&height=2700","blue_marble.tif")
}

# read in the raster map and
# set the extent, crop to extent and reproject to polar
r = raster::brick("blue_marble.tif")
e = raster::extent(c(-180,180,55,90))
r_crop = raster::crop(r,e)

# traps NA values and sets them to 1
r_crop[is.na(r_crop)] = 1 
r_polar = raster::projectRaster(r_crop, crs = polar, method = "bilinear")

# some values are not valid after transformation 
# (rgb range = 1 - 255) set these back to 1
# as they seem to be the black areas
r_polar[r_polar < 1 ] = 1

# define the graticule / grid lines by first specifying
# the larger bounding box in which to place them, and
# feeding this into the sp() gridlines function
# finally the grid lines are transformed to
# the EPSG 3995 projection
pts=SpatialPoints(rbind(c(-180,55),c(0,55),c(180,85),c(180,85)), CRS("+init=epsg:4326"))
gl = gridlines(pts, easts = seq(-180,180,30), norths = seq(50,85,10), ndiscr = 100)
gl.polar = spTransform(gl, polar)

# I also create a single line which I use to mark the
# edge of the image (which is rather unclean due to pixelation)
# this line sits at 55 degrees North similar to where I trimmed
# the image
pts=SpatialPoints(rbind(c(-180,55),c(0,55),c(180,80),c(180,80)), CRS("+init=epsg:4326"))
my_line = SpatialLines(list(Lines(Line(cbind(seq(-180,180,0.5),rep(55,721))), ID="outer")), CRS("+init=epsg:4326"))

# crop a map object (make the x component a bit larger not to exclude)
# some of the eastern islands (the centroid defines the bounding box)
# and will artificially cut of these islands
m = maps2sp(c(-180,200),c(55,90),clip = TRUE)

#----- below this point is the plotting routine
# set margins to let the figure "breath" and accommodate labels
par(mar=rep(1,4))

# plot the grid, to initiate the area
# plotRGB() overrides margin settings in default plotting mode
plot(spTransform(gl, polar), lwd=2, lty=2,col="white")

# plot the blue marble raster data
raster::plotRGB(blue_marble, add = TRUE)

# plot grid lines / graticule
lines(spTransform(gl, polar), add = TRUE, lwd=2, lty=2,col=grid.col.light)

# plot outer margin of the greater circle
lines(spTransform(ll, polar), lwd = 3, lty = 1, col=grid.col.dark)

# plot continent outlines, for clarity
plot(spTransform(m, polar), lwd = 1, lty = 1, col = "transparent", border=grid.col.dark, add = TRUE)

# plot longitude labels
l = labels(gl.polar, crs.longlat, side = 1)
l$pos = NULL
text(l, cex = 1, adj = c( 0.5, 2 ),  col = "black")

# plot latitude labels
l = labels(gl.polar, crs.longlat, side = 2)
l$srt = 0
l$pos = NULL
text(l, cex = 1, adj = c(1.2, -1), col = "white")

# After all this you can plot your own site locations etc
# but don't forget to tranform the data from lat / long
# into the arctic polar stereographic projection using
# spTransform()

COBECORE project accepted for funding

Past September (2016) I wrote the “Congo basin eco-climatological data recovery and valorisation” or COBECORE proposal together with several partners building upon and inspired by the success of the Jungle Rhythms project.

The Jungle Rhythms uses citizen science to transcribe old colonial records of tree phenology (seasonal changes in the state of the tree). The project illustrates nicely that historical data can still hold significant value for current day research, and data which seem out of reach due to the challenging nature of transcription can be tackled with the generous help of citizen scientists.

Given this notion, I expanded upon the basic idea of Jungle Rhythms in order to digitize and transcribe further historical colonial records of eco-climatological importance as stored in the state archives in Brussels. Although the competition was stiff in the thematic axis 3 & 6 (cultural, historical and scientific heritage) of the Belgian Science Policy Office BRAIN call with close to 90 submissions and a rather rough 16% success rate, the project was still selected!

I’m therefor happy to announce the informal start of the COBECORE project as funded by the Belgian Science Policy Office (pending political approval of the science budget).

snotelr – a R package for easy access to SNOTEL data

I recently created the MCD10A1 product. This is a combined MODIS MOD10A1 and MYD10A1 product, alleviating some of the low bias introduced by either overpass through a maximum value approach. This approach has been used in the study by Gascoin et al. (2013) but I wanted some additional validation of the retrieved values.

As such I looked at the SNOTEL network which  ”… is composed of over 800 automated data collection sites located in remote, high-elevation mountain watersheds in the western U.S. They are used to monitor snowpack, precipitation, temperature, and other climatic conditions. The data collected at SNOTEL sites are transmitted to a central database, called the Water and Climate Information System, where they are used for water supply forecasting, maps, and reports.” Here, the snowpack metrics could provide the needed validation data for my MCD10A1 product.

Although the SNOTEL website offers plenty of plotting options for casual exploration and the occasional report, but the interface remains rather clumsy with respect to full automation. As such, and similar to my amerifluxr package (both in spirit and execution), I created the snotelr R package. Below you find a brief description of the package and it’s functions.

Installation

You can quick install the package by installing the following dependencies

install.packages("devtools")

and downloading the package from the github repository

library(devtools)
install_github("khufkens/snotelr")
library(devtools)
install_github("khufkens/snotelr")

Use

Most people will prefer the GUI to explore data on the fly. To envoke the GUI use the following command:

library(snotelr)
snotel.explorer()

This will start a shiny application with an R backend in your default browser. The first window will display all site locations, and allows for subsetting of the data based upon state or a bounding box. The bounding box can be selected by clicking top-left and bottom-right.

The plot data tab allows for interactive viewing of the soil water equivalent (SWE) data together with a covariate (temperature, precipitation). The SWE time series will also mark snow phenology statistics, mainly the day of:

  • first snow melt
  • a continuous snow free season (last snow melt)
  • first snow accumulation (first snow deposited)
  • continuous snow accumulation (permanent snow cover)
  • maximum SWE (and it’s amount)

For in depth analysis the above statistics can be retrieved using the snow.phenology() function

# with df a SNOTEL file or data frame in your R workspace
snow.phenology(df)

To access the full list of SNOTEL sites and associated meta-data use the snotel.info() function.

# returns the site info as snotel_metadata.txt in the current working directory
snotel.info(path = ".") 

# export to data frame
data = snotel.info(path = NULL)

To query data for e.g. site 924 as shown in the image above use:

download.snotel(site = 924)

 

Tree mortality: common causes and preliminary statistics

In the Jungle Rhythms project volunteers tag observations with #hashtags on the online forum. One observation in particular is not only informative towards post-processing of the annotations but also has scientific value in it’s own right. Mainly, the cause of death of an observed tree within the Jungle Rhythms project holds information on the ecology of the tree and human and natural stresses it experiences, which lead to it’s demise.

Within this context I ran some quick statistics on the hashtags of the online forum of the Jungle Rhythms project.

Overall, several sources of tree death exist as nicely summarized by @itsmestephanie and I quote:

“Abattu as in abattoir. Felled, cut down.  / Coupé as in coupon. Cut, presumably down. / Passants - passers-by. Coupé par les passants - cut down by passers-by. / Sec as in desiccation. Dry. / Brûlé as in crème brûlée. Burnt. / Cassé: Broken. / Tombé: Fallen. / Vent as in ventilation. Wind. Tombé par le vent = Fallen (rather pushed down) by a really big vent. / Mort: Mortician, mortality, mortuary. Morbidity, moribund, morbid. Jack Mort. Lord Voldemort.”

I counted all instances of the hashtags on subjects in the forum and summed them using both natural or human causes. Double mentions were excluded, not to count hashtags multiple times within the same forum post.

The largest class is the “coupé” class, with a total of 257 occurrences. Second on the list is the “cassé” class with 74 mentions, followed by “mort” (72) and “tombé” (46). All other classes list smaller numbers.

Summing all human caused events results in a total of 264 deaths, while natural causes only account for approximately half this number (133). Both these values account for 10 and 5 % of the total number of observed trees. With the project currently at 90% completion and no incentive to report the events I’ll have to validate the true numbers.

However, a few conclusions can be drawn from these simple statistics. Firstly, the human influence on the experiment was significant (twice as large as the natural deaths). As far as I could tell most trees were located along forest paths. This increased the likelihood of a tree being cut down due to easy accessibility (e.g. more elaborate description such as: coupé par les passants). Within the natural causes the classes “cassé” and “tombe” account for a large fraction of the deaths. In other words, over 50% of the deaths are related to physical instability and tree fall of either the whole tree (tombé) or the bole supporting the canopy (cassé).

Treefall is an important process in forest regeneration, statistics derived from the Jungle Rhythms project therefor not only give insight into seasonal processes of the trees observed but also provides mortality rates and causes.

scientific legitimacy in publishing

Science and climate science in particular has always been at the center of what, post US election, is being described as fake news. Fake news or “post-truth” (more honestly plain lies) have been shaping the discussion around climate change for years. Over the past years the scale of fake news grew and with it mainstream media outlets lost authority and trust.

This flood of fake news is at it’s core a form of obfuscation. Obfuscation aims to hide a true message or signal by increasing the noise which feeds the same channel. It clutters the news sphere using a false equivalency that all information sources (regardless) merit equal weight. Tactics that dominate science discussions that were fed by fake news and fought in the public news sphere are slowly shifting to the formal academic world of scientific publishing as fake (science) open access journals become more common.

The past few years there has been a push for open access journals. Open access journals rely on academics to pay for the final publishing of the journal article, rather than asking for exorbitant access fees post publication. Although promising in terms of free access to scientific work the push for open access has led to a flourishing business of shady journals; facilitated by the publish or perish culture in academia. As with fake news, fake academic journals and fake science obfuscate valid research results by increasing the number of low quality research publications one has to wade through.

For example the journal “Expert Opinion on Environmental Biology” seems like a respectable if not high flying journal with an impact factor of 4.22 (above average in ecology). However, the devil is in the details as the footnote attached reads:

*Unofficial 2015 Journal Impact Factor was established by dividing the number of articles published in 2013 and 2014 with the number of times they are cited in 2015 based on Google search and the Scholar Citation Index database. If ‘X’ is the total number of articles published in 2013 and 2014, and ‘Y’ is the number of times these articles were cited in indexed journals during 2015 than, impact factor = Y/X

Generally journals use citation indices, or impact factors, to indicate their visibility within the academic community. Proper journals are mostly listed by the Institute of Scientific Information(currently ISI Web of Knowledge) and summarized in a yearly Science Citation Index report. Most fake journals can’t establish these credentials and therefore trick scientists by publishing fake numbers (( More so, when searching on the web for ISI one easily comes across imposters as well. Here the service International Scientific Indexing (or ISIndexing.com, the name is well chosen) provides a service focussed on “… to increase the visibility and ease of use of open access scientific and scholarly journals.” )). Although the journal might still contain valid and good research, the tactics used do not instill trust.

More alarming than the profiteering from desperate scientists who chase metrics and the resulting obfuscation is a recent trend of merger acquisitions of more respected journals by fake academic publishers. Here the tactic is to buy small legitimate journals to intersperse with their lesser variety, borrowing trust. Not only will these mergers make it harder to distinguish good from bad journals, it will also increase chances of low quality peer-review, as solid science was never the motive of these predatory publishers. If this is a new trend the question remains how to safe-guard scientific legitimacy of open access journals and science in general, and what format to use?

I would argue that to solve the issue of shady open access journals we need even more radical openness in science. If one is forced to publish data, and code (if not links to how to obtain the data from 3th party sources) it become easier to separate those with quality research from those containing nothing but random noise.

The time invested in a fake research article becomes significantly larger, discouraging abuse. In addition, it will force people into good data management as ugly code and data structures will reflect badly on the scientist as well. Furthermore, since all pieces of the research are available it will also solve issues regarding reproducibility and inter-comparison of research results. Finally I would argue that similar practices could be used in conventional journalism, reporting all raw data used, sources (if not endangering lives) and statistics (if applicable). Transparency is the only way forward in an age of fake news and science, lack of it should be regarded as suspicious.

Pagination


© 2018. All rights reserved.

Powered by Hydejack v7.5.1