Caffe hack: outputting the FC7 layer

The Caffe deep learning framework has a nice set of python scripts to help automate classification jobs. However, I found the standard output rather limited. The script does not output other data than the predicted result.

Some applications could benefit from outputting the final FC7 classification weights. These weights together with a classification key can then be used to assign different labels (semantic interpretations) using the same classification run.  Below you find my new python ( script which outputs the FC7 layer.

This new script allows me to assign classification labels using both the SUN database and the Places205 database in one pass using the MIT places convolutional neural network (CNN).

#!/usr/bin/env python
Classifier is an image classifier specialization of Net.

import numpy as np

import caffe

class Classifier(caffe.Net):
    Classifier extends Net for image class prediction
    by scaling, center cropping, or oversampling.

    image_dims : dimensions to scale input for cropping/sampling.
        Default is to scale to net input size for whole-image crop.
    mean, input_scale, raw_scale, channel_swap: params for
        preprocessing options.
    def __init__(self, model_file, pretrained_file, image_dims=None,
                 mean=None, input_scale=None, raw_scale=None,
        caffe.Net.__init__(self, model_file, pretrained_file, caffe.TEST)

        # configure pre-processing
        in_ = self.inputs[0]
        self.transformer =
            {in_: self.blobs[in_].data.shape})
        self.transformer.set_transpose(in_, (2, 0, 1))
        if mean is not None:
            self.transformer.set_mean(in_, mean)
        if input_scale is not None:
            self.transformer.set_input_scale(in_, input_scale)
        if raw_scale is not None:
            self.transformer.set_raw_scale(in_, raw_scale)
        if channel_swap is not None:
            self.transformer.set_channel_swap(in_, channel_swap)

        self.crop_dims = np.array(self.blobs[in_].data.shape[2:])
        if not image_dims:
            image_dims = self.crop_dims
        self.image_dims = image_dims

    def predict(self, inputs, oversample=True):
        Predict classification probabilities of inputs.

        inputs : iterable of (H x W x K) input ndarrays.
        oversample : boolean
            average predictions across center, corners, and mirrors
            when True (default). Center-only prediction when False.

        predictions: (N x C) ndarray of class probabilities for N images and C
        # Scale to standardize input dimensions.
        input_ = np.zeros((len(inputs),

        for ix, in_ in enumerate(inputs):
            input_[ix] =, self.image_dims)

        if oversample:
            # Generate center, corner, and mirrored crops.
            input_ =, self.crop_dims)
            # Take center crop.
            center = np.array(self.image_dims) / 2.0
            crop = np.tile(center, (1, 2))[0] + np.concatenate([
                -self.crop_dims / 2.0,
                self.crop_dims / 2.0
            input_ = input_[:, crop[0]:crop[2], crop[1]:crop[3], :]

        # Classify
        caffe_in = np.zeros(np.array(input_.shape)[[0, 3, 1, 2]],
        for ix, in_ in enumerate(input_):
            caffe_in[ix] = self.transformer.preprocess(self.inputs[0], in_)
	#out = self.forward_all(**{self.inputs[0]: caffe_in}) # original

	# grab the FC7 layer in addition to the normal classification
	# data and output it to a seperate variable
	out = self.forward_all(**{self.inputs[0]: caffe_in, 'blobs': ['fc7']})
        predictions = out[self.outputs[0]]
	fc7 = self.blobs['fc7'].data

        # For oversampling, average predictions across crops.
        if oversample:
            predictions = predictions.reshape((len(predictions) / 10, 10, -1))
            predictions = predictions.mean(1) # column wise mean, rows = 0

	    fc7 = fc7.reshape((len(fc7) / 10, 10, -1))
            fc7 = fc7.mean(1).reshape(-1)

	# output both the classification as specified
	# by the current classifier, as the fc7 feature
	# to run on another feature matching set
        return predictions, fc7


Basic pattern matching: saturday morning hack

The Pattern Perception Zooniverse project asks Zooniversites to classify patterns based upon their (dis)similarity. Yet given the narrow scope of the problem (no rotations, same output every time) it was well worth exploring how a simple covariance based metric would perform. The input to the problem is a set of 7 images, one reference image and six scenarios to compare it to. Below you see the general layout of the image as shown on the Zooniverse website (I’ll use a different image afterwards).


The basic test would be to calculate x features for all maps and compare the six scenarios to the reference map and record the covariance of each of these comparisons. I then rank and plot the images accordingly. Although I’m pretty certain that any greyscale covariance metric would perform well in this case (including the raw data). However, I added a spatially explicit information based upon the Grey Level Co-occurence Matrix (GLCM) features. This ensures in part the inclusion of some spatial information such as the homogeneity of the images.

When performing this analysis on a set of images this simple approach works rather well. The image below shows you the ranking (from good to worse - top to bottom) of six images (left) compared to the reference image (right) (Fig. 1). This ranking is based upon the covariance of all GLCM metrics. In this analysis map 3 seems not to fall nicely in the sequence (to my human eye / mind). However, all GLCM features are weighted equally in this classification. When I only use the “homogeneity” GLCM feature in a classification a ranking of the images appears more pleasing to the eye (Fig. 2).

A few conclusions can be drawn from this:

  1. Human vision seems to pick up high frequency features more than low frequency ones, in this particular case. However, in general things are a bit more complicated.
  2. In this case, the distribution of GLCM features does not match human perception and this unequal weighing relative to our perception sometimes provides surprising rankings.
  3. Overall the best matched images still remain rather stable throughout suggesting that overall the approach works well and is relatively unbiased.

Further exploration of these patterns can done with a principal component analysis (PCA) on the features as calculated for each pixel. The first PC-score would indicated which pixels cause the majority of the variability across maps 1-6 relative to the reference (if differences are taken first). This indicate regions which are more stable or variable under different model scenarios. Furthermore, the project design lends itself to a generalized mixed model approach, with more statistical power than a simple PCA. This could provide insights in potential drivers of this behaviour (either due to model structure errors or ecological / hydrological processes). A code snippet of the image analysis written in R using the GLCM package is attached below (slow but functional).

map_comparison Fig 1. An image comparison based upon all GLCM features.[/caption]


# load required libs

# set timer
ptm <- proc.time()

# load the reference image and calculate the glcm
ref = raster('scenario_reference.tif',bands=1)
ref_glcm = glcm(ref) # $glcm_homogeneity to only select the homogeneity band

# create a mask to kick out values
# outside the true area of interest
mask = ref == 255
mask = as.vector(getValues(mask))

# convert gclm data to a long vector
x = as.vector(getValues(ref_glcm))

# list all maps to compare to
maps = list.files(".","scenario_map*")

# create a data frame to store the output
covariance_output =,length(maps),2))

# loop over the maps and compare
# with the reference image
for (i in 1:length(maps)){

  # print the map being processed

  # load the map into memory and
  # execute the glcm routine
  map = glcm(raster(maps[i],bands=1)) # $glcm_homogeneity to only select the homogeneity band

  # convert stacks of glcm features to vector
  y = as.vector(getValues(map))

  # merge into matrix
  # mask out border data and
  # drop NA values
  mat = cbind(x,y)
  mat = mat[which(mask != 1),]
  mat = na.omit(mat)

  # put the map number on file
  covariance_output[i,1] = i

  # save the x/y covariance
  covariance_output[i,2] = cov(mat)[1,2]

# sort the output based upon the covariance
# in decreasing order (best match = left)
covariance_output = covariance_output[order(covariance_output[,2],decreasing = TRUE),]

# stop timer
print(proc.time() - ptm)

# loop over the covariance output to plot how the maps best
# compare
for (i in 1:dim(covariance_output)[1]){
  rgb_img = brick(sprintf("scenario_map%s.tif",covariance_output[i,1]))
  ref = brick("scenario_reference.tif")
  legend("top",legend=sprintf("Map %s",covariance_output[i,1]),bty='n',cex=3)

Jungle Rhythms user statistics: location

I needed to get a grasp on where different Jungle Rhythm users come from for a grant I’m writing. Zooniverse kindly provided me with some country based summary statistics providing me with some insight into the user activity per region.

Currently most of the users come from English speaking countries, with the highest number of sessions from the US (47%) and the UK (17%). High session numbers are also noted for Canada and Australia (both ~3.5%). Surprisingly Belgium scores really well accounting for 4% of the user sessions (potentially due to my own activity when I’m in Belgium and ElizabethB who has been very active on the project). The remaining top 10 countries, France, Germany, The Netherlands and Poland are combined good for another 10% of all sessions. All EU countries combined are good for ~31% of all the sessions, indicating that the Jungle Rhythms user pool is largely split between the US and the EU (with a strong focus on the UK). Below you find a barplot representing the described statistics.



MCD10A1: a combined MODIS aqua / terra snow cover product

For an ongoing project I had to calculate fractional snow cover transition dates across Alaska. These transition dates mark when snow levels drop below 5% coverage or when continuous accumulation of more than 5% starts. The period between these two dates can be considered snow free, and has the potential to support vegetation growth.

For this exercise I used my usual set of tools and data streams (R + local MODIS HDF tiles). Sadly, this processing chain took forever. Given the rather simple nature of the code I decided to turn to Google Earth Engine (GEE) and rewrote the analysis into GEE javascript. Below you find the resulting MCD10A1 product, a combined MODIS Aqua / Terra snow cover product. The original MODIS products (MOD10A1/MYD10A1) as processed by the National Snow and Ice Data Center have the tendency to be biased low. Hence, my solution fuses the two data streams using a maximum value approach across the two data streams. Furthermore, the script is written to loop over all years with sufficient data (2003 - current), and collects the above mentioned transitions dates. Finally, the script runs a simple regression analysis to mark trends in snow melt dates, answering the question if snow melt occurs later or earlier with each passing year!?

Below you see a snow cover trend map of the US and Canada, with earlier snow melt marked in red (-) and later snow melt marked in blue (+). Some patterns stand out, mainly mountain ranges in the West of the US have seen earlier and earlier snow melt dates. Alaska, doesn’t see very clear patterns, aside from earlier snow melt in the southern peninsula. Arctic Canada does see clear earlier snow melt throughout. Surprisingly, some later snow melt dates can be noted as well. From the middle of Alberta, Canada sweeping east to the great lakes a trend of later snow melt dates is noted. A word of caution is needed as these regions might be heavily influenced by the “polar vortex” in 2015, creating a long winter season, and potentially skewing the trend analysis. Furthermore, locations without data are places with a very short snow cover period, which were excluded for clarity (i.e. these are areas were snow cover has less of an influence on the growing season).

In short, as long as the analysis isn’t too complicated GEE is a quick way to get good and reproducible results. Sadly, the system does have it’s limitation and makes true time series analysis (with custom aggregation and curve fitting techniques) rather difficult if not impossible. I feel that these kinds of analysis are clearly outside the scope of the more GIS oriented GEE setup. I tip my hat to Google for providing this service, rather impressive I must say.


MODIS HDF data extraction in R

The Moderate-resolution imaging spectroradiometer (MODIS) sensors on both Aqua and Terra platforms provide a wealth of (environmental) data. However, manipulating the raw files can be challenging. Although Google Earth Engine provides an easier way to access these data, as most of the MODIS products are hosted, sometimes direct manipulation is still necessary. Here I quickly outline how to extract data from raw MODIS files using R.


To download extract data from a MODIS HDF file in R you need a few packages. Mainly:

After installing these packages make sure you load them to continue.


Finding your way on the globe

First you have to determine where your data is located within the tiled format MODIS data is distributed in. All tiles are denoted with a horizontal (h) and a vertical index (v). For the Land Products there are roughly ~350 or so tiles with varying degrees of actual land coverage. All MODIS data are distributed in a custom sinusoidal projection, hence the rather weird looking shape of the map.


In order to find the right tile we load a coordinate into R. In this case it’s the location of Harvard Yard, or 42.375028, -71.116493 latitude and longitude respectively.

# the location of Harvard Yard
harvard_yard = cbind(42.375028, -71.116493)

# define the projection system to use
# lat-long in this case
latlon = CRS('+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0')

# create a spatial object from the coordinate(s)
coordinates = SpatialPoints(harvard_yard, latlon)

# download the overview of all the tiles in KMZ
download.file("",destfile = "modis_sin.kmz")
tiles = readOGR("./modis_sin.kmz","Features")

# get the horizontal and vertical tile location for the coordinate(s)
horizontal = extract(tiles,coordinates)$h[1]
vertical = extract(tiles,coordinates)$v[1]

Once you have the horizontal and vertical tiles you can download only these tiles instead of the entire archive.

Extracting data

As previously mentioned, data is stored in HDF files. These files are readily readable by the GDAL library. However, given the structured nature of these files you have to know what you are looking for. These layers in structured HDF files are called scientific data sets (SDS). You need to specify one to read them individually.

# read in all SDS strings (layer descriptors)
# using the MODIS package
sds <- getSds(filename)

# grab layer 4 of a particular hdf file
my_hdf_layer = raster(readGDAL(sds$SDS4gdal[4], = TRUE))

# should the HDF file not contain any layers, or a singular one
# you can default to using raster() any additional arguments
my_hdf_file = raster(filename)

Now you have succesfully read in data. However, this data is projected using the sinusoidal projection, not the lat-long. In order to extract data at given locations we have to translate the coordinates of these extraction points to the MODIS sinusoidal projection.

# specify the sinusoidal projection
sinus = CRS("+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs")

# transform the coordinates
coordinates_sinus = spTransform(coordinates,sinus)

Finally extract the data at the new sinusoidal coordinates.

# extract data
my_data_point = extract(my_hdf_layer,coordinates_sinus)


© 2018. All rights reserved.

Powered by Hydejack v7.5.1