in Phenocam / Research / Science / Software on Deep learning, Phenocam, Research, Science
Past week I started to play with the Caffe deep learning framework. Although I initially planned on using the SegNet branch of the Caffe framework to classify snow in PhenoCam images. However, given that it concerns a rather binary classification I don’t need to segment the picture (I do not care where the snow in the image is, only if it is present). As such, a more semantic approach could be used.
Luckily people at MIT had already trained a classifier, the Places-CNN, which deals with exactly this problem, characterizing an image scene. So, instead of training my own classifier I gave theirs a try. Depending on the image type, and mostly the view angle the results are very encouraging (even with their stock model).
For example, the below image got classified as: mountain snowy, ski slope, snowfield, valley, ski_resort. This all seems very reasonable indeed. Classifying a year worth of images at this site yielded an accuracy of 89% (compared to human observations).
However, when the vantage point changes so does the accuracy of the classification, mainly due to the lack of images of this sort in the original training data set I presume. The image below was classified as: rainforest, tree farm, snowy mountain, mountain, cultivated field. As expected, the classification accuracy dropped to a mere 13%. There is still room for improvement using PhenoCam based training data. But, building upon the work by the group at MIT should make these improvements easier.
in Research / Science / Software on Caffe, Code, Cuda, Deep learning, Gpu, Segnet
Here I provide a simple set of bash commands and settings to get started with the caffe-SegNet tutorial on the Harvard Odyssey cluster and it’s NVIDIA CUDA capabilities. If the below setup works you can move on and start processing your own data.
First install load all necessary modules into your ~/.bashrc file
# clone the tutorial data and rename the directory
git clone https://github.com/alexgkendall/SegNet-Tutorial.git
mv SegNet-Tutorial Segnet
# move into the new directory
# clone the caffe-segnet code
git clone https://github.com/alexgkendall/caffe-segnet.git
# download the Odyssey specific cmake settings
wget -q https://www.dropbox.com/s/hbhzl2bwm19vtd0/FindAtlas.cmake?dl=0 -O ./caffe-segnet/cmake/Modules/FindAtlas.cmake
# create the build directory
# move into the build directory
# create compilation instructions
cmake -DCMAKE_INCLUDE_PATH:STRING="$CUDNN_INCLUDE;$BOOST_INCLUDE;$GFLAGS_INCLUDE;$HDF5_INCLUDE;$GLOG_INCLUDE;$PROTOBUF_INCLUDE;$SNAPPY_INCLUDE;$LMDB_INCLUDE;$LEVELDB_INCLUDE;$ATLAS_INCLUDE;$PYTHON_INCLUDE" -DCMAKE_LIBRARY_PATH:STRING="$BOOST_LIB;$GFLAGS_LIB;$HDF5_LIB;$GLOG_LIB;$PROTOBUF_LIB;$SNAPPY_LIB;$LMDB_LIB;$LEVELDB_LIB;$ATLAS_LIB;$CUDNN_LIB;$PYTHON_LIB" ..
# compile and test all code
in Jungle rhythms / Research / Science on Citizen science, Jungle rhythms
Jungle Rhythms is live for little over a month and has accumulated an impressive 40,000 classifications. With a substantial amount of data classified, I’ll be transforming these classifications in actual dates of seasonal growth patterns (instead of lines on paper) in the next few weeks or so.
In the meantime, using the same classification data, I made a visualization of user contributions over the past month. In the graph below you see rectangles with their relative size scaled to number of contributions by each user (listed in the rectangle). Grey rectangles are contributions made by non registered users.
Currently, ElizabethB is leading the pack with a hard to beat 11758 classifications. Rainbobrite is runner up with 6467 classifications. Although a few large contributors make up more than 50% of the classifications the remaining classifications are made in lower numbers by more people. For example, 7% of all classifications are made by unregistered users classifying ~3 images per session. This illustrates the power in numbers, which drives a lot of citizen science. All contributions matter, even the few classifications now and again!
in Jungle rhythms / Research on Citizen science, Dr congo, History, Jungle rhythms, Research
The brief history of agricultural research in Congo starts after 1908 when the Belgian state took control of Congo ending the rule of Leopold II, due to international outcry over atrocities committed.
In subsequent years the Belgian state, under guidance of Edmond Leplae and informed by agricultural engineer Jean Claessens, created a government institution (Service de l’Agriculture) focussed on agricultural development, mirroring research facilities in other tropical colonies. Although policy was focused on boosting export crops, in part by increased focus on research, the period up until Leplea’s retirement was dominated by his much hated policy of mandated cultivation (e.g. cotton and rubber).
After 1930 there was a shift in policy away from mandated cultivation and focusing on research driven agricultural development with stronger focus on supporting the local farmers. As such, the Institut National pour l’Étude Agronomique du Congo belge (INÉAC, the Institute for the study and agricultural development of Belgian Congo) was created in 1933, with headquarters in Yangambi.
All data in collected and digitized in the Jungle Rhythms project were gathered during the latter period at the Yangabmi research station. Although there was some ongoing research before this period. The INÉAC created a major shift towards basic research, in addition to the applied agricultural research. This basic research were often well coordinated and documented research efforts. This basic research topics included plant diseases, botany, geology but also genetics. Most surprisingly INÉAC was run by scientists with minimal intervention of the Belgian administration (either local or afar). However, INÉAC recieved support from the government which makes complete autonomy questionable, especially WWII set part of the research agenda.
It is clear that the Congo agricultural research stations (and Yangambi in particular) have a long and winding history. At the eve of independence the research station had built up solid international reputation running large autonomous experiments and data collection throughout the Congo basin. The data on seasonal dynamics of tree species digitized within the Jungle Rhythms project is part of this historical research effort. However, even after more than 70 years these data still retain their scientific value and could contribute to solving some of todays research questions and problems.
in Research / Science / Software on Phenocam, Research, Science, Software
Recently I started experimenting with deep learning, a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers. The reason for this excursion in a more exotic classifier framework is the tricky issue of snow.
Snow on evergreen canopies artificially decreases the greenness value of an image, corrupting an otherwise rather smooth PhenoCam time series of greenness. Below you see a split image of a normal snow free canopy, and a snow covered canopy, visually showing this decrease in greenness.
These snowy days result in the dips in Gcc (greenness) as seen in the time series of image greenness below.
Within the lab we tried various techniques to spot snow on these canopies. These techniques were mostly colour metrics, based upon the distances in colour space (from either white or grey) of a particular pixel or region of interest. However, all efforts failed or were not generalizable across all sites, meaning that every site would need to be parameterized independently (which is a processing headache).
In an effort to address this classification problem I installed the SegNet variety of the Caffe Deep Learning Framework (hence the blog post title). The SegNet framework allows for pixel based classification based upon a deep learning approach, originally designed to quickly (matter of millisecons) classify street images to assist autonomous vehicles. However, I hope this approach might help solve the issue of classifying these snowy days, recognizing snowy canopies instead of pedestrians. Results will follow in the coming weeks.