# Jungle Rhythms: time series accuracy and analysis

Today marks the 6th month since Jungle Rhythms’ launch. During this period more than 8500 citizen scientists, from both sides of the Atlantic, contributed to the project. Currently, ~60% (~182 000 classifications) of the total workload is finished and almost half of the tasks at hand are fully retired.

Next week I’ll present some of the first results at the Association of Tropical Biology and Conservation (ATBC) 2016 meeting. But, not to keep anyone waiting I’ll describe some of the retrieved time series, in addition to initial estimates of classification accuracy in this blog post. These analysis are done with a first iteration of my processing code and partial retrievals, so gaps and processing errors are present. I’ll focus on Millettia laurentii, commonly known as Wenge or faux ebony, a tropical endangered tree species.

Below you see a figures marking the four annotated life cycle phases of tropical trees as present in the Jungle Rhythms data. From top to bottom I list flowering, fruit development, fruit dissemination (fr. ground) and leaf drop or senescence. In the figures black bars mark the location of Jungle Rhythms derived estimations of life cycle events, while red bars mark the location of independent validation data. Dashed vertical grey lines mark annual boundaries.

Overall, Jungle Rhythms classification results were highly accurate with overall accuracies using Kappa values, a measure of accuracy, range from a low of 0.56 (Figure 1.) up to 0.97, where values between 0.81–1 are considered to be in almost perfect agreement. Lower values of of the Kappa index are mostly due to missing Jungle Rhythms data. Not all data has been processed yet, and these data haven’t been excluded from the validation statistics. For example in Figure 1. the year 1950 and 1947 are missing. In more complete time series, accuracy rises to Kappa values of 0.85 and 0.97, Figures 2. and 3 respectively.

On a life cycle event basis similar performance is noted. However, there might be a slight bias for instances where events span longer time periods, a known processing error. Furthermore, some uncertainty is also due to an imperfect validation dataset. For example, the lack of validation data (red marks) in 1953 for senescence in Figure 2 is an error in the validation data not in the Jungle Rhythms classification data. This further illustrates that error rates do exist in “expert” classified data.

Although no formal analysis has been executed a quick visual comparison shows recurrent leaf drop and flowering at the peak of the dry season. Although some trees show similar patterns (Figure 1 and 2), others do not (Figure 3, below). These differences in phenology across individuals shows the great plasticity of tree phenology in the tropics, and potential independence of both light or temperature cues, but more in tune with water availability (proximity to water sources).

Summarizing, classification results of the Jungle Rhythms project are highly accurate. Furthermore, it’s highly likely that with proper post processing all classification results will reach perfect agreement. More so, the retrieved data already illustrates some of the phenological patterns in Millettia laurentii (Wenge), and how they correspond across years (Figures 1 and 2) or how they might differ between individuals (Figure 3).

Once more, I thank all the citizen scientists who contributed to this project. Without your contributions, one classification at a time, this would not have been possible.

# Raspberry pi camera v2: spectral response curve

Recently the raspberry pi foundation released a new iteration of their camera, version 2 (v2). This camera is based upon a different chipset compared to the previous version, namely Sony’s IMX219 (8MP). Luckily the specs were easier to find. So, once more I digitized the spectral response curves from the spec sheets.

You can find both spectral response curves of v1 and v2 pi cameras in my github repository. An example image of the response curves for the new Sony IMX219 chipset is shown below.

# Caffe hack: outputting the FC7 layer

The Caffe deep learning framework has a nice set of python scripts to help automate classification jobs. However, I found the standard classifier.py output rather limited. The script does not output other data than the predicted result.

Some applications could benefit from outputting the final FC7 classification weights. These weights together with a classification key can then be used to assign different labels (semantic interpretations) using the same classification run.  Below you find my new python (classifier.py) script which outputs the FC7 layer.

This new script allows me to assign classification labels using both the SUN database and the Places205 database in one pass using the MIT places convolutional neural network (CNN).

#!/usr/bin/env python
"""
Classifier is an image classifier specialization of Net.
"""

import numpy as np

import caffe

class Classifier(caffe.Net):
"""
Classifier extends Net for image class prediction
by scaling, center cropping, or oversampling.

Parameters
----------
image_dims : dimensions to scale input for cropping/sampling.
Default is to scale to net input size for whole-image crop.
mean, input_scale, raw_scale, channel_swap: params for
preprocessing options.
"""
def __init__(self, model_file, pretrained_file, image_dims=None,
mean=None, input_scale=None, raw_scale=None,
channel_swap=None):
caffe.Net.__init__(self, model_file, pretrained_file, caffe.TEST)

# configure pre-processing
in_ = self.inputs[0]
self.transformer = caffe.io.Transformer(
{in_: self.blobs[in_].data.shape})
self.transformer.set_transpose(in_, (2, 0, 1))
if mean is not None:
self.transformer.set_mean(in_, mean)
if input_scale is not None:
self.transformer.set_input_scale(in_, input_scale)
if raw_scale is not None:
self.transformer.set_raw_scale(in_, raw_scale)
if channel_swap is not None:
self.transformer.set_channel_swap(in_, channel_swap)

self.crop_dims = np.array(self.blobs[in_].data.shape[2:])
if not image_dims:
image_dims = self.crop_dims
self.image_dims = image_dims

def predict(self, inputs, oversample=True):
"""
Predict classification probabilities of inputs.

Parameters
----------
inputs : iterable of (H x W x K) input ndarrays.
oversample : boolean
average predictions across center, corners, and mirrors
when True (default). Center-only prediction when False.

Returns
-------
predictions: (N x C) ndarray of class probabilities for N images and C
classes.
"""
# Scale to standardize input dimensions.
input_ = np.zeros((len(inputs),
self.image_dims[0],
self.image_dims[1],
inputs[0].shape[2]),
dtype=np.float32)

for ix, in_ in enumerate(inputs):
input_[ix] = caffe.io.resize_image(in_, self.image_dims)

if oversample:
# Generate center, corner, and mirrored crops.
input_ = caffe.io.oversample(input_, self.crop_dims)
else:
# Take center crop.
center = np.array(self.image_dims) / 2.0
crop = np.tile(center, (1, 2))[0] + np.concatenate([
-self.crop_dims / 2.0,
self.crop_dims / 2.0
])
input_ = input_[:, crop[0]:crop[2], crop[1]:crop[3], :]

# Classify
caffe_in = np.zeros(np.array(input_.shape)[[0, 3, 1, 2]],
dtype=np.float32)
for ix, in_ in enumerate(input_):
caffe_in[ix] = self.transformer.preprocess(self.inputs[0], in_)
#out = self.forward_all(**{self.inputs[0]: caffe_in}) # original

# grab the FC7 layer in addition to the normal classification
# data and output it to a seperate variable
out = self.forward_all(**{self.inputs[0]: caffe_in, 'blobs': ['fc7']})
predictions = out[self.outputs[0]]
fc7 = self.blobs['fc7'].data

# For oversampling, average predictions across crops.
if oversample:
predictions = predictions.reshape((len(predictions) / 10, 10, -1))
predictions = predictions.mean(1) # column wise mean, rows = 0

fc7 = fc7.reshape((len(fc7) / 10, 10, -1))
fc7 = fc7.mean(1).reshape(-1)

# output both the classification as specified
# by the current classifier, as the fc7 feature
# to run on another feature matching set
return predictions, fc7


# Basic pattern matching: saturday morning hack

The Pattern Perception Zooniverse project asks Zooniversites to classify patterns based upon their (dis)similarity. Yet given the narrow scope of the problem (no rotations, same output every time) it was well worth exploring how a simple covariance based metric would perform. The input to the problem is a set of 7 images, one reference image and six scenarios to compare it to. Below you see the general layout of the image as shown on the Zooniverse website (I’ll use a different image afterwards).

The basic test would be to calculate x features for all maps and compare the six scenarios to the reference map and record the covariance of each of these comparisons. I then rank and plot the images accordingly. Although I’m pretty certain that any greyscale covariance metric would perform well in this case (including the raw data). However, I added a spatially explicit information based upon the Grey Level Co-occurence Matrix (GLCM) features. This ensures in part the inclusion of some spatial information such as the homogeneity of the images.

When performing this analysis on a set of images this simple approach works rather well. The image below shows you the ranking (from good to worse - top to bottom) of six images (left) compared to the reference image (right) (Fig. 1). This ranking is based upon the covariance of all GLCM metrics. In this analysis map 3 seems not to fall nicely in the sequence (to my human eye / mind). However, all GLCM features are weighted equally in this classification. When I only use the “homogeneity” GLCM feature in a classification a ranking of the images appears more pleasing to the eye (Fig. 2).

A few conclusions can be drawn from this:

1. Human vision seems to pick up high frequency features more than low frequency ones, in this particular case. However, in general things are a bit more complicated.
2. In this case, the distribution of GLCM features does not match human perception and this unequal weighing relative to our perception sometimes provides surprising rankings.
3. Overall the best matched images still remain rather stable throughout suggesting that overall the approach works well and is relatively unbiased.

Further exploration of these patterns can done with a principal component analysis (PCA) on the features as calculated for each pixel. The first PC-score would indicated which pixels cause the majority of the variability across maps 1-6 relative to the reference (if differences are taken first). This indicate regions which are more stable or variable under different model scenarios. Furthermore, the project design lends itself to a generalized mixed model approach, with more statistical power than a simple PCA. This could provide insights in potential drivers of this behaviour (either due to model structure errors or ecological / hydrological processes). A code snippet of the image analysis written in R using the GLCM package is attached below (slow but functional).

Fig 1. An image comparison based upon all GLCM features.[/caption]

# load required libs
require(raster)
require(glcm)

# set timer
ptm <- proc.time()

# load the reference image and calculate the glcm
ref = raster('scenario_reference.tif',bands=1)
ref_glcm = glcm(ref) # $glcm_homogeneity to only select the homogeneity band # create a mask to kick out values # outside the true area of interest mask = ref == 255 mask = as.vector(getValues(mask)) # convert gclm data to a long vector x = as.vector(getValues(ref_glcm)) # list all maps to compare to maps = list.files(".","scenario_map*") # create a data frame to store the output covariance_output = as.data.frame(matrix(NA,length(maps),2)) # loop over the maps and compare # with the reference image for (i in 1:length(maps)){ # print the map being processed print(maps[i]) # load the map into memory and # execute the glcm routine map = glcm(raster(maps[i],bands=1)) #$glcm_homogeneity to only select the homogeneity band

# convert stacks of glcm features to vector
y = as.vector(getValues(map))

# merge into matrix
# mask out border data and
# drop NA values
mat = cbind(x,y)
mat = na.omit(mat)

# put the map number on file
covariance_output[i,1] = i

# save the x/y covariance
covariance_output[i,2] = cov(mat)[1,2]
}

# sort the output based upon the covariance
# in decreasing order (best match = left)
covariance_output = covariance_output[order(covariance_output[,2],decreasing = TRUE),]

# stop timer
print(proc.time() - ptm)

# loop over the covariance output to plot how the maps best
# compare
png("map_comparison.png",width=800,height=1600)
par(mfrow=c(6,2))
for (i in 1:dim(covariance_output)[1]){
rgb_img = brick(sprintf("scenario_map%s.tif",covariance_output[i,1]))
ref = brick("scenario_reference.tif")
plotRGB(rgb_img)
legend("top",legend=sprintf("Map %s",covariance_output[i,1]),bty='n',cex=3)
plotRGB(ref)
legend("top",legend="Reference",bty='n',cex=3)
}
dev.off()

# Jungle Rhythms user statistics: location

I needed to get a grasp on where different Jungle Rhythm users come from for a grant I’m writing. Zooniverse kindly provided me with some country based summary statistics providing me with some insight into the user activity per region.

Currently most of the users come from English speaking countries, with the highest number of sessions from the US (47%) and the UK (17%). High session numbers are also noted for Canada and Australia (both ~3.5%). Surprisingly Belgium scores really well accounting for 4% of the user sessions (potentially due to my own activity when I’m in Belgium and ElizabethB who has been very active on the project). The remaining top 10 countries, France, Germany, The Netherlands and Poland are combined good for another 10% of all sessions. All EU countries combined are good for ~31% of all the sessions, indicating that the Jungle Rhythms user pool is largely split between the US and the EU (with a strong focus on the UK). Below you find a barplot representing the described statistics.