Deep learning snowy images

Past week I started to play with the Caffe deep learning framework. Although I initially planned on using the SegNet branch of the Caffe framework to classify snow in PhenoCam images. However, given that it concerns a rather binary classification I don’t need to segment the picture (I do not care where the snow in the image is, only if it is present). As such, a more semantic approach could be used.

Luckily people at MIT had already trained a classifier, the Places-CNN, which deals with exactly this problem, characterizing an image scene. So, instead of training my own classifier I gave theirs a try. Depending on the image type, and mostly the view angle the results are very encouraging (even with their stock model).

For example, the below image got classified as: mountain snowy, ski slope, snowfield, valley, ski_resort. This all seems very reasonable indeed. Classifying a year worth of images at this site yielded an accuracy of  89% (compared to human observations).



However, when the vantage point changes so does the accuracy of the classification, mainly due to the lack of images of this sort in the original training data set I presume. The image below was classified as: rainforest, tree farm, snowy mountain, mountain, cultivated field. As expected, the classification accuracy dropped to a mere 13%. There is still room for improvement using PhenoCam based training data. But, building upon the work by the group at MIT should make these improvements easier.

Odyssey caffe-SegNet installation instructions

Here I provide a simple set of bash commands and settings to get started with the caffe-SegNet tutorial on the Harvard Odyssey cluster and it’s NVIDIA CUDA capabilities. If the below setup works you can move on and start processing your own data.

First install load all necessary modules into your ~/.bashrc file

# SegNet required modules and 
# interdependencies
module load python/2.7.6-fasrc01
module load gcc/4.8.2-fasrc01 
module load atlas/3.10.2-fasrc02
module load cudnn/6.5.48-fasrc01
module load protobuf/20151218-fasrc01
module load cmake/
module load gflags/2.1.2-fasrc01 
module load glog/0.3.4-fasrc01 
module load hdf5/1.8.12-fasrc08 
module load opencv/3.0.0-fasrc02 
module load lmdb/20160113-fasrc01 
module load cuda/6.5-fasrc01
module load leveldb/1.18-fasrc01 
module load boost/1.59.0-fasrc01
module load gmp/5.1.3-fasrc01 
module load mpfr/3.1.2-fasrc01 
module load mpc/1.0.1-fasrc01
module load libav/0.8.17-fasrc01
module load ffmpeg/2.3.2-fasrc01
module load libvpx/v1.3.0-fasrc01
module load xvidcore/1.3.3-fasrc01
module load libtheora/1.1.1-fasrc01
module load yasm/1.3.0-fasrc01
module load opus/1.0.3-fasrc01
module load fdk-aac/0.1.3-fasrc01
module load lame/3.99.5-fasrc01
module load x264/20140814-fasrc01
module load faac/1.28-fasrc01
module load libvpx/v1.3.0-fasrc01
module load opencore-amr/0.1.3-fasrc01
module load libass/0.11.2-fasrc01
module load fribidi/0.19.1-fasrc01
module load enca/1.15-fasrc01
module load libvorbis/1.3.4-fasrc01

Reload your .bashrc file

source ~/.bashrc

Start a GPU session on the cluster

srun --pty --gres=gpu -p gpu -t 600 --mem 8000 /bin/bash

Then download and compile all code

# clone the tutorial data and rename the directory
git clone
mv SegNet-Tutorial Segnet

# move into the new directory
cd Segnet

# clone the caffe-segnet code
git clone

# download the Odyssey specific cmake settings
wget -q -O ./caffe-segnet/cmake/Modules/FindAtlas.cmake

# create the build directory
mkdir ./caffe-segnet/build

# move into the build directory
cd ./caffe-segnet/build

# create compilation instructions

# compile and test all code
make all
make runtest

Jungle Rhythms user statistics

Jungle Rhythms is live for little over a month and has accumulated an impressive 40,000 classifications. With a substantial amount of data classified, I’ll be transforming these classifications in actual dates of seasonal growth patterns (instead of lines on paper) in the next few weeks or so.

In the meantime, using the same classification data, I made a visualization of user contributions over the past month. In the graph below you see rectangles with their relative size scaled to number of contributions by each user (listed in the rectangle). Grey rectangles are contributions made by non registered users.

Currently, ElizabethB is leading the pack with a hard to beat 11758 classifications. Rainbobrite is runner up with 6467 classifications. Although a few large contributors make up more than 50% of the classifications the remaining classifications are made in lower numbers by more people. For example, 7% of all classifications are made by unregistered users classifying ~3 images per session. This illustrates the power in numbers, which drives a lot of citizen science. All contributions matter, even the few classifications now and again!


The top 10 high rollers in numbers:

  1. ElisabethB (11758)
  2. Rainbobrite (6467)
  3. chrisotahal (2806)
  4. Ekima (2604)
  5. britishclimate (2306)
  6. Missybee35 (1123)
  7. Cuboctahedron (1109)
  8. Ravno (729)
  9. seachanged (577)
  10. khufkens (522)


The Yangambi agricultural research station – a short history

The brief history of agricultural research in Congo starts after 1908 when the Belgian state took control of Congo ending the rule of Leopold II, due to international outcry over atrocities committed.

In subsequent years the Belgian state, under guidance of Edmond Leplae and informed by agricultural engineer Jean Claessens, created a government institution (Service de l’Agriculture) focussed on agricultural development, mirroring research facilities in other tropical colonies. Although policy was focused on boosting export crops, in part by increased focus on research, the period up until Leplea’s retirement was dominated by his much hated policy of mandated cultivation (e.g. cotton and rubber).

Tree Plantation, Yangambi - State Archives

After 1930 there was a shift in policy away from mandated cultivation and focusing on research driven agricultural development with stronger focus on supporting the local farmers. As such, the Institut National pour l’Étude Agronomique du Congo belge (INÉAC, the Institute for the study and agricultural development of Belgian Congo) was created in 1933, with headquarters in Yangambi.

All data in collected and digitized in the Jungle Rhythms project were gathered during the latter period at the Yangabmi research station. Although there was some ongoing research before this period. The INÉAC created a major shift towards basic research, in addition to the applied agricultural research. This basic research were often well coordinated and documented research efforts. This basic research topics included plant diseases, botany, geology but also genetics. Most surprisingly INÉAC was run by scientists with minimal intervention of the Belgian administration (either local or afar). However, INÉAC recieved support from the government which makes complete autonomy questionable, especially WWII set part of the research agenda.

Soil sciences laboratory, Yangambi (Congo) - from the INEAC archives.

It is clear that the Congo agricultural research stations (and Yangambi in particular) have a long and winding history. At the eve of independence the research station had built up solid international reputation running large autonomous experiments and data collection throughout the Congo basin. The data on seasonal dynamics of tree species digitized within the Jungle Rhythms project is part of this historical research effort. However, even after more than 70 years these data still retain their scientific value and could contribute to solving some of todays research questions and problems.

(abbreviated / edited version of a text found on the Center of Agricultural History site written by Sephanie Kerckhofs, 2014)









A cup of deep learning coffee

Recently I started experimenting with deep learning, a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers. The reason for this excursion in a more exotic classifier framework is the tricky issue of snow.

Snow on evergreen canopies artificially decreases the greenness value of an image, corrupting an otherwise rather smooth PhenoCam time series of greenness. Below you see a split image of a normal snow free canopy, and a snow covered canopy, visually showing this decrease in greenness.

These snowy days result in the dips in Gcc (greenness) as seen in the time series of image greenness below.

Within the lab we tried various techniques to spot snow on these canopies. These techniques were mostly colour metrics, based upon the distances in colour space (from either white or grey) of a particular pixel or region of interest. However, all efforts failed or were not generalizable across all sites, meaning that every site would need to be parameterized independently (which is a processing headache).

In an effort to address this classification problem I installed the SegNet variety of the Caffe Deep Learning Framework (hence the blog post title). The SegNet framework allows for pixel based classification based upon a deep learning approach, originally designed to quickly (matter of millisecons) classify street images to assist autonomous vehicles. However, I hope this approach might help solve the issue of classifying these snowy days, recognizing snowy canopies instead of pedestrians. Results will follow in the coming weeks.






© 2018. All rights reserved.

Powered by Hydejack v7.5.1