# Basic GIS in R

Recently I had to help out a colleague with processing some spatial data in R. The overall goal was to find landscape parameters (topography, soil characteristics, etc.) within the neighbourhood of a flux tower.

The below  code gives a short introduction on how to extract data from both raster and shapefiles (vector data), very common operations .  I will add more tips and tricks as I come across them.

# Load libraries.
# For raster data
require(raster)

# For vector data
require(sp)

# Set flux tower coordinates.
lat= 45.2041
lon=-68.7402

# Define lat / lon projection.
lat_lon <- CRS("+init=epsg:4326")

# Define UTM projection.
utm <- CRS( "+proj=utm +zone=19 +north +ellps=WGS84 +datum=WGS84 +no_defs")

# Read in the coordinates and assign them a projection, # otherwise they remain just 'numbers'
location <- SpatialPoints(cbind(lon,lat), lat_lon)

# Convert to utm using the above projection string
# using spTransform.
location_utm <- spTransform(location, utm)

# Load raster Digital Terrain Map (DTM) data.
# If the raster data has no projection,
# you will have to assign a projection.
# I assume the file has a projection.
dtm <- raster('DTM.tif')

# Relative height of the location of the tower.
dtm_data <- extract(dtm,location_utm)

# Assign projection to the soils.shp file
# if not provided.
proj4string(soils) <- utm

# Extract soil polygon attributes.
soil_attributes <- over(location_utm,soils)

Yesterday I updated my laptop from Ubuntu 12.04 LTS to the latest release 14.04 LTS (Trusty Thar). Overall the experience is very pleasant. I noticed that the interface is snappier, bootup is as fast (It’s difficult to notice any differences on a fast SSD drive). What I did not like was the removal of some functionality in the file manager, mainly splitting the file manager window into two columns. This has been a sore point for many users but there are work arounds of it really messes with your workflow. I’ll give it a try and go without for a while. Also some of my favourite themes don’t work anymore. I now use the Numix theme as a subsitute.

I also did some counting and I figured out that I’ve spend most of my research career (approaching 10 years) using Ubuntu, starting with their first release 4.10 (warty warthog). More so, aside from some graphics work for which I used a Mac I’ve been only using Linux for research and day to day needs, and never regretted making this move. With Ubuntu moving into tablet territory I foresee some exciting times in the linux (open-source) community.

# DaymetR – gridded data extension

Short recap: Daymet is a collection of algorithms and computer software designed to interpolate and extrapolate from daily meteorological observations to produce gridded estimates of daily weather parameters - as concisely described on the Daymet website.

I’m extensively using daily meteorological data to drive a grassland growth model and quick and easy access to this data is therefore key. Although I previously coded a script to deal with single pixel data extraction I now updated my DaymetR package to include gridded (tiled) data downloads through the specification of a region of interest (top left / bottom right) or single point location.

The script has the following syntax:

download.Daymet.tiles(lat1=36.0133,
lon1=-84.2625,
lat2=NA,
lon2=NA,
start_yr=1980,
end_yr=2012,
param="ALL")

If only the first set of coordinates is provided (lat1 / lon1) the tile in which these reside is downloaded. If your location / region of interest [defined by a top left (lat1 / lon1) and bottom right (lat2 / lon2) coordinate set] falls outside the scope of the DAYMET data coverage a warning is issued. All tiles covering the region of interest are downloaded for their respective parameters, e.g. minimum and maximum temperature (see bitbucket documentation for details).

This final addition is a spatial extension of the previous code, serving primarily the spatial scaling needs of my modelling exercises. All code can be found on through my software page or by following this direct link to my bitbucket page. I hope this might serve other people in the R community.

# DAYMET ancillary data conversion

I recently posted tools to extract DAYMET data for point locations. However, the product as produced by the DAYMET team is gridded data. If you want to scale models driven by DAYMET data spatially you need access to this gridded (spatial) data.

Sadly, access to the data is rather convoluted. If you want to download this gridded data, or tiles, you first have to figure out which tiles you need. This means, going to the DAYMET website and clicking several times within the tiles you want to download to get their individual numbers. After establishing all these tile numbers you can download them from the THREDDS server.

Obviously this routine is less than ideal, if for example you want to do regional to state based research and have to select a large number of tiles. I hope to automate this process. First step in this process is reprojecting the tiles grid to a latitude and longitude. Oddly enough the format of the netCDF file was not strandard and rather confusing. It took me a while to figure out how to accomplish this. Here is a bash script to do so. These instructions work on all ancillary netCDF grids as provided by DAYMET. From the generate tile file using the below code you can either determine the tile based upon any given point location or alternatively for a region of interest. The latter is my goal and in the works, as the former is covered by previous code.

#!/bin/bash

# get filename with no extension
no_extension=basename $1 | cut -d'.' -f1 # convert the netCDF file to an ascii file gdal_translate -of AAIGrid$1 original.asc

# extract the data with no header
tail -n +7 original.asc > ascii_data.asc

# paste everything together again with a correct header
echo "ncols        8011" 	>  final_ascii_data.asc
echo "nrows        8220"	>> final_ascii_data.asc
echo "xllcorner    -4659000.0" 	>> final_ascii_data.asc
echo "yllcorner    -3135000.0" 	>> final_ascii_data.asc
echo "cellsize     1000" 	>> final_ascii_data.asc
echo "NODATA_value 0"    	>> final_ascii_data.asc

# append flipped data
tac ascii_data.asc >> final_ascii_data.asc

# translate the data into Lambert Conformal Conic GTiff
gdal_translate -of GTiff -a_srs "+proj=lcc +datum=WGS84 +lat_1=25 n +lat_2=60n +lat_0=42.5n +lon_0=100w" final_ascii_data.asc tmp.tif

# convert to latitude / longitude
gdalwarp -of GTiff -overwrite -t_srs "EPSG:4326" tmp.tif tmp_lat_lon.tif

# crop to reduce file size, only cover DAYMET data areas
gdal_translate -a_nodata -9999 -projwin -131.487784581 52.5568285568 -51.8801911189 13.9151864748 tmp_lat_lon.tif \$no_extension.tif

# clean up
rm original.asc
rm ascii_data.asc
rm final_ascii_data.asc
rm tmp.tif
rm tmp_lat_lon.tif

# You are what you eat

Going through my literature I came across an interesting but slightly macabre paper with the innocent title “Seasonal water availability predicts the relative abundance of C3 and C4 grasses in Australia” (by Murphy and Bowman, 2007).  In short, the paper discusses the species distribution and optimal growing conditions of C3/C4 grasses. Differences in the photosynthetic pathway of C4 grasses allows them to function at temperatures exceeding those optimal for C3 grasses. However, not only temperature limits C3 or C4 grass growth but other factors such as the timing of precipitation influences the growth and relative abundance of either group at a given location. Disentangling this relative abundance is required to eliminate potential biases when estimating carbon uptake by the terrestrial biosphere.

The authors tackled the question of relative C3/C4 abundance and diversity across a large geographical extent, by using a common technique in ecology namely transects of quadrats scattered across the Australian continent. Sadly, this is time intensive to establish and measure these quadrats. However, a macabre twist to this story helps to scale these measurements. Dead kangaroos to be precise.

Kangaroos eat a ton of grass. These grasses, depending on their photosynthetic pathway (C3 or C4), will show differences in their 13C stable isotope composition.

Isotopes are atoms of the same element but with a different atomic mass, due to additional neutrons at it's nucleus. When these atoms do not radioactively decay we call them stable isotopes. Carbon (C) in it's most abundant form has 6 neutrons and 6 protons at it's nucleus, hence called carbon-12 (12C).  Several isotopic forms of carbon exist with carbon-13 (13C) having one and carbon-14 (14C) having two extra neutrons in their nuclei. Of these 14C is unstable, decaying into more stable components. The 13C isotope is stable and will therefore persist.

And, since you are what you eat (on a molecular level - and organismic level as well I guess) these differences will be reflected in the tissues of the kangaroo. Assuming that kangaroos don’t prefer a particular kind of grass. Both authors used collagen tissue samples of 779 road-killed individuals to calculate the relative abundance of C3/C4 grasses around the road-kill locations. Combining these measurements with climate data, it allowed them to scale the relative abundance of grasses across the whole of Australia. Hence, they potentially removed some bias in estimates of terrestrial biosphere carbon uptake by these grasslands.

Although my main interest in this paper was the relation between climate and abundance of C3/C4 grasses I think the paper teaches some valuable lesson in science outside it’s original scope, mainly “the proxy measurement “ (and a rather macabre one).  Furthermore, it can be used as a fun example to show people the unexpected part of science. I assume few Australians would have imagined that a lot of road-killed kangaroos could tell a story about the kind of grasses it once ate and where they grow!