# Parameter estimation in R: a simple stomatal conductance model example

Most modeling approaches use some sort of optimization method to estimate parameters. These parameters are the knobs one has to turn to tune a statistical or mechanistic model exactly right to make it fit the data. Below I give a quick example of parameter estimation (in R) using a stomatal conductance (gs) model.

A collaborator in the lab came to me with the question to quickly introduce him to parameter estimation in R in order to optimize a stomatal conductance model. This is the document and code I used to briefly explain the methodology.

In this example a model calculates stomatal conductance based upon environmental variables and measurements of gmax (or the maximum diffuse stomatal conductance). This model illustrates nicely how one can use R and an optimizer to estimate complex model parameters of non-linear models.

The framework I put together estimates parameters using a Generalized Simulated Annealing optimization R package. However, different optimization packages and methods exist. Some worth mentioning are the default ‘optim’ function from the ‘stats’ package and DiffeRential Evolution Adaptive Metropolis (DREAM). However, I found for smaller projects and quick model development GenSA works really well.

The crude stomatal conductance model as formulated borrows heavily from model structures as described by Jarvis (1976) and White et al. 1999, both succintly described in Damour et al. (2010). The latter paper describes a variety of stomatal conductance models, which could be used with the framework as outlined in this example.

In this model example stomatal conductance is modelled as:

gs = gmax * f(CO2) * f(VPD) * f(PAR)

Here CO2 and VPD (vapour pressure deficit) response curves are modelled using an epxonential equation, while PAR (photosynthetically active radiation) is modelled using Michaelis-Menten kinetics and gmax is measured and calculated as defined by Franks & Beerling (2009).

• f(CO2) = c1 * e -c2[CO2]

• f(VPD) = v1 * e -v2[VPD]

• f(PAR) = ( 2000 * PAR) / (p1 + PAR)

Or condensed and written into an R function this gives:

gs.model = function(par = par, data = data) {

# put variables in readable format
vpd <- data$vpd co2 <- data$co2
par_val <- data$par gmax <- data$gmax

# unfold parameters
# for clarity
c1 <- par[1]
c2 <- par[2]
v2 <- par[3]
p1 <- par[4]

# model formulation
gs <- gmax * c1 * exp(c2 * co2) * exp(v2 * vpd) * (( 2000 * par_val) / (p1 + par_val))

# return stomatal conductance
return(gs)
}


Here, the ‘data’ and ‘par’ variables are a data frame and vector including the model drivers and parameters respectively.

The optimization minimizes a cost function which in this example is defined as the root mean squared error (RMSE) between the observed and the predicted values.

# run model and compare to true values
# returns the RMSE
cost.function <- function(data, par) {
obs <- as.vector(data$COND) pred <- as.vector(gs.model(par = par, data = data)) s < (obs - pred)^2 RMSE <- sqrt(mean(s, na.rm = TRUE)) return(RMSE) }  In the final optimization will iteratively step through the parameter space, running the cost function (which in terms calls the main model), and find an optimal solution to the problem (i.e. minimize the RMSE). Often starting model parameters and limits to the parameter space are required. Defining the limits of your parameter space well can significantly reduce the time needed to converge upon a solution. The main reason is that the model structure on it’s own does not have any sense of the physical reality of the world. If measured values can never be lower than 0 it does not make sense to look for a multiplicative parameter in the negative range of the parameter space. # starting model parameters par = c(0.65, -0.002, -0.689, 0.05) # limits to the parameter space lower <- c(0,-10,-100,0) upper <- c(100,0,0,100) # optimize the model parameters optim.par = GenSA( par = par, fn = cost.function, data = data, lower = lower, upper = upper, control=list(temperature = 4000) )  As the data I used in the development of this model aren’t published yet I can’t include any graphs. However, I think that this description can give people a quick introduction in getting started with model development (in R). ## References Damour G., Simonneau T., Cochard H., Urban L. (2010) An overview of models of stomatal conductance at the leaf level. Plant Cell & Environment 2010 33, 1419-38. Jarvis P.G. (1976) The interpretation of the variations in leaf water potential and stomatal conductance found in canopies in the field. Philosophical Transactions of the Royal Society of London. Series B 273, 593–610. White D.A., Beadle C., Sands P.J., Worledge D. & Honeysett J.L.(1999) Quantifying the effect of cumulative water stress on sto- matal conductance of Eucalyptus globulus and Eucalyptus nitens: a phenomenological approach. Australian Journal of Plant Physiology 26, 17–27. Franks, P. J. & Beerling, D. J. (2009) Maximum leaf conductance driven by CO2 effects on stomatal size and density over geologic time. Proceedings of the National Academy of Sciences 106, 10343–10347. # Pi power-over-ethernet For a project of mine I needed power-over-ethernet (PoE) to be available on my Raspberry Pi. There are several reasons why one would want to use PoE. In my case I need a reliable network connection, and power couldn’t be provided consistently using solar or stand alone methods either. PoE combines both things without the necessity of having to run an extra power cable, proving a rather neat package. Sadly, the PoE options out there weren’t flexible enough to my liking. Take for example this$40 raspberry pi PoE HAT. Although it provides power to pi there is no option to hook up additional power hungry USB devices. Some finagling would do the trick. However, if I should thinker anyway I better make something that fits my needs.

So instead I made my own little (passive) PoE power solution. A first proof of concept looked like the image above. I used a passive PoE splitter to split the ethernet and the power on the left side (black dongle), to power a \$20 uBEC which converts 12-60V into 5V/3A (red shield, I discarded the housing). The 5V output of the uBEC gets connected to the power leads two micro-USB cables (black cables) where one has its TX/RX lines patched through to a  male USB type-A cable. The latter setup allows for data transfer between a high powered device (hard drive) and the Raspberry Pi. This setup bypasses the issue of the PoE hat, which only powers the Pi and consequently can only provide limited power to connected USB device.

Unhappy with this rather ugly solution I remade the same setup using screw terminals and proper type-A port on a larger prototyping shield. The whole setup remains the same but this board now neatly stacks on top of the Pi (with some additional support from standoffs and female header rows). On the left you see a version with only one powered USB port, which powers the Raspberry Pi, and an additional screw terminal output. On the right you see the exact same setup as above, where the left hand USB ports (2x) provide power only, while the right two ports consist of one patch port (connecting a power free data connection) and injecting power and data into the other.

Technically, I could connect power directly to the 5V rail on the Pi, but since I’m not sure how stable and clean the power of the uBEC is I avoid this for now. Power spikes can easily kill GPIO pins or completely fry my Pi. A future iteration might include basic fuses and overpower protection as outlined above. This would limit damage by a less than ideal or reverse input voltages.

But, for now my Raspberry Pi PoE issues are solved.

# Jungle Rhythms: time series accuracy and analysis

Today marks the 6th month since Jungle Rhythms’ launch. During this period more than 8500 citizen scientists, from both sides of the Atlantic, contributed to the project. Currently, ~60% (~182 000 classifications) of the total workload is finished and almost half of the tasks at hand are fully retired.

Next week I’ll present some of the first results at the Association of Tropical Biology and Conservation (ATBC) 2016 meeting. But, not to keep anyone waiting I’ll describe some of the retrieved time series, in addition to initial estimates of classification accuracy in this blog post. These analysis are done with a first iteration of my processing code and partial retrievals, so gaps and processing errors are present. I’ll focus on Millettia laurentii, commonly known as Wenge or faux ebony, a tropical endangered tree species.

Below you see a figures marking the four annotated life cycle phases of tropical trees as present in the Jungle Rhythms data. From top to bottom I list flowering, fruit development, fruit dissemination (fr. ground) and leaf drop or senescence. In the figures black bars mark the location of Jungle Rhythms derived estimations of life cycle events, while red bars mark the location of independent validation data. Dashed vertical grey lines mark annual boundaries.

Overall, Jungle Rhythms classification results were highly accurate with overall accuracies using Kappa values, a measure of accuracy, range from a low of 0.56 (Figure 1.) up to 0.97, where values between 0.81–1 are considered to be in almost perfect agreement. Lower values of of the Kappa index are mostly due to missing Jungle Rhythms data. Not all data has been processed yet, and these data haven’t been excluded from the validation statistics. For example in Figure 1. the year 1950 and 1947 are missing. In more complete time series, accuracy rises to Kappa values of 0.85 and 0.97, Figures 2. and 3 respectively.

On a life cycle event basis similar performance is noted. However, there might be a slight bias for instances where events span longer time periods, a known processing error. Furthermore, some uncertainty is also due to an imperfect validation dataset. For example, the lack of validation data (red marks) in 1953 for senescence in Figure 2 is an error in the validation data not in the Jungle Rhythms classification data. This further illustrates that error rates do exist in “expert” classified data.

Although no formal analysis has been executed a quick visual comparison shows recurrent leaf drop and flowering at the peak of the dry season. Although some trees show similar patterns (Figure 1 and 2), others do not (Figure 3, below). These differences in phenology across individuals shows the great plasticity of tree phenology in the tropics, and potential independence of both light or temperature cues, but more in tune with water availability (proximity to water sources).

Summarizing, classification results of the Jungle Rhythms project are highly accurate. Furthermore, it’s highly likely that with proper post processing all classification results will reach perfect agreement. More so, the retrieved data already illustrates some of the phenological patterns in Millettia laurentii (Wenge), and how they correspond across years (Figures 1 and 2) or how they might differ between individuals (Figure 3).

Once more, I thank all the citizen scientists who contributed to this project. Without your contributions, one classification at a time, this would not have been possible.

# Raspberry pi camera v2: spectral response curve

Recently the raspberry pi foundation released a new iteration of their camera, version 2 (v2). This camera is based upon a different chipset compared to the previous version, namely Sony’s IMX219 (8MP). Luckily the specs were easier to find. So, once more I digitized the spectral response curves from the spec sheets.

You can find both spectral response curves of v1 and v2 pi cameras in my github repository. An example image of the response curves for the new Sony IMX219 chipset is shown below.

# Caffe hack: outputting the FC7 layer

The Caffe deep learning framework has a nice set of python scripts to help automate classification jobs. However, I found the standard classifier.py output rather limited. The script does not output other data than the predicted result.

Some applications could benefit from outputting the final FC7 classification weights. These weights together with a classification key can then be used to assign different labels (semantic interpretations) using the same classification run.  Below you find my new python (classifier.py) script which outputs the FC7 layer.

This new script allows me to assign classification labels using both the SUN database and the Places205 database in one pass using the MIT places convolutional neural network (CNN).

#!/usr/bin/env python
"""
Classifier is an image classifier specialization of Net.
"""

import numpy as np

import caffe

class Classifier(caffe.Net):
"""
Classifier extends Net for image class prediction
by scaling, center cropping, or oversampling.

Parameters
----------
image_dims : dimensions to scale input for cropping/sampling.
Default is to scale to net input size for whole-image crop.
mean, input_scale, raw_scale, channel_swap: params for
preprocessing options.
"""
def __init__(self, model_file, pretrained_file, image_dims=None,
mean=None, input_scale=None, raw_scale=None,
channel_swap=None):
caffe.Net.__init__(self, model_file, pretrained_file, caffe.TEST)

# configure pre-processing
in_ = self.inputs[0]
self.transformer = caffe.io.Transformer(
{in_: self.blobs[in_].data.shape})
self.transformer.set_transpose(in_, (2, 0, 1))
if mean is not None:
self.transformer.set_mean(in_, mean)
if input_scale is not None:
self.transformer.set_input_scale(in_, input_scale)
if raw_scale is not None:
self.transformer.set_raw_scale(in_, raw_scale)
if channel_swap is not None:
self.transformer.set_channel_swap(in_, channel_swap)

self.crop_dims = np.array(self.blobs[in_].data.shape[2:])
if not image_dims:
image_dims = self.crop_dims
self.image_dims = image_dims

def predict(self, inputs, oversample=True):
"""
Predict classification probabilities of inputs.

Parameters
----------
inputs : iterable of (H x W x K) input ndarrays.
oversample : boolean
average predictions across center, corners, and mirrors
when True (default). Center-only prediction when False.

Returns
-------
predictions: (N x C) ndarray of class probabilities for N images and C
classes.
"""
# Scale to standardize input dimensions.
input_ = np.zeros((len(inputs),
self.image_dims[0],
self.image_dims[1],
inputs[0].shape[2]),
dtype=np.float32)

for ix, in_ in enumerate(inputs):
input_[ix] = caffe.io.resize_image(in_, self.image_dims)

if oversample:
# Generate center, corner, and mirrored crops.
input_ = caffe.io.oversample(input_, self.crop_dims)
else:
# Take center crop.
center = np.array(self.image_dims) / 2.0
crop = np.tile(center, (1, 2))[0] + np.concatenate([
-self.crop_dims / 2.0,
self.crop_dims / 2.0
])
input_ = input_[:, crop[0]:crop[2], crop[1]:crop[3], :]

# Classify
caffe_in = np.zeros(np.array(input_.shape)[[0, 3, 1, 2]],
dtype=np.float32)
for ix, in_ in enumerate(input_):
caffe_in[ix] = self.transformer.preprocess(self.inputs[0], in_)
#out = self.forward_all(**{self.inputs[0]: caffe_in}) # original

# grab the FC7 layer in addition to the normal classification
# data and output it to a seperate variable
out = self.forward_all(**{self.inputs[0]: caffe_in, 'blobs': ['fc7']})
predictions = out[self.outputs[0]]
fc7 = self.blobs['fc7'].data

# For oversampling, average predictions across crops.
if oversample:
predictions = predictions.reshape((len(predictions) / 10, 10, -1))
predictions = predictions.mean(1) # column wise mean, rows = 0

fc7 = fc7.reshape((len(fc7) / 10, 10, -1))
fc7 = fc7.mean(1).reshape(-1)

# output both the classification as specified
# by the current classifier, as the fc7 feature
# to run on another feature matching set
return predictions, fc7