in Research / Science / Software on Caffe, Code, Cuda, Deep learning, Gpu, Segnet
Here I provide a simple set of bash commands and settings to get started with the caffe-SegNet tutorial on the Harvard Odyssey cluster and it’s NVIDIA CUDA capabilities. If the below setup works you can move on and start processing your own data.
First install load all necessary modules into your ~/.bashrc file
# clone the tutorial data and rename the directory
git clone https://github.com/alexgkendall/SegNet-Tutorial.git
mv SegNet-Tutorial Segnet
# move into the new directory
# clone the caffe-segnet code
git clone https://github.com/alexgkendall/caffe-segnet.git
# download the Odyssey specific cmake settings
wget -q https://www.dropbox.com/s/hbhzl2bwm19vtd0/FindAtlas.cmake?dl=0 -O ./caffe-segnet/cmake/Modules/FindAtlas.cmake
# create the build directory
# move into the build directory
# create compilation instructions
cmake -DCMAKE_INCLUDE_PATH:STRING="$CUDNN_INCLUDE;$BOOST_INCLUDE;$GFLAGS_INCLUDE;$HDF5_INCLUDE;$GLOG_INCLUDE;$PROTOBUF_INCLUDE;$SNAPPY_INCLUDE;$LMDB_INCLUDE;$LEVELDB_INCLUDE;$ATLAS_INCLUDE;$PYTHON_INCLUDE" -DCMAKE_LIBRARY_PATH:STRING="$BOOST_LIB;$GFLAGS_LIB;$HDF5_LIB;$GLOG_LIB;$PROTOBUF_LIB;$SNAPPY_LIB;$LMDB_LIB;$LEVELDB_LIB;$ATLAS_LIB;$CUDNN_LIB;$PYTHON_LIB" ..
# compile and test all code
in Jungle rhythms / Research / Science on Citizen science, Jungle rhythms
Jungle Rhythms is live for little over a month and has accumulated an impressive 40,000 classifications. With a substantial amount of data classified, I’ll be transforming these classifications in actual dates of seasonal growth patterns (instead of lines on paper) in the next few weeks or so.
In the meantime, using the same classification data, I made a visualization of user contributions over the past month. In the graph below you see rectangles with their relative size scaled to number of contributions by each user (listed in the rectangle). Grey rectangles are contributions made by non registered users.
Currently, ElizabethB is leading the pack with a hard to beat 11758 classifications. Rainbobrite is runner up with 6467 classifications. Although a few large contributors make up more than 50% of the classifications the remaining classifications are made in lower numbers by more people. For example, 7% of all classifications are made by unregistered users classifying ~3 images per session. This illustrates the power in numbers, which drives a lot of citizen science. All contributions matter, even the few classifications now and again!
in Jungle rhythms / Research on Citizen science, Dr congo, History, Jungle rhythms, Research
The brief history of agricultural research in Congo starts after 1908 when the Belgian state took control of Congo ending the rule of Leopold II, due to international outcry over atrocities committed.
In subsequent years the Belgian state, under guidance of Edmond Leplae and informed by agricultural engineer Jean Claessens, created a government institution (Service de l’Agriculture) focussed on agricultural development, mirroring research facilities in other tropical colonies. Although policy was focused on boosting export crops, in part by increased focus on research, the period up until Leplea’s retirement was dominated by his much hated policy of mandated cultivation (e.g. cotton and rubber).
[caption id=”attachment_995” align=”aligncenter” width=”500”] Tree Plantation, Yangambi - State Archives[/caption]
After 1930 there was a shift in policy away from mandated cultivation and focusing on research driven agricultural development with stronger focus on supporting the local farmers. As such, the Institut National pour l’Étude Agronomique du Congo belge (INÉAC, the Institute for the study and agricultural development of Belgian Congo) was created in 1933, with headquarters in Yangambi.
All data in collected and digitized in the Jungle Rhythms project were gathered during the latter period at the Yangabmi research station. Although there was some ongoing research before this period. The INÉAC created a major shift towards basic research, in addition to the applied agricultural research. This basic research were often well coordinated and documented research efforts. This basic research topics included plant diseases, botany, geology but also genetics. Most surprisingly INÉAC was run by scientists with minimal intervention of the Belgian administration (either local or afar). However, INÉAC recieved support from the government which makes complete autonomy questionable, especially WWII set part of the research agenda.
[caption id=”attachment_994” align=”aligncenter” width=”500”] Soil sciences laboratory, Yangambi (Congo) - from the INEAC archives.[/caption]
It is clear that the Congo agricultural research stations (and Yangambi in particular) have a long and winding history. At the eve of independence the research station had built up solid international reputation running large autonomous experiments and data collection throughout the Congo basin. The data on seasonal dynamics of tree species digitized within the Jungle Rhythms project is part of this historical research effort. However, even after more than 70 years these data still retain their scientific value and could contribute to solving some of todays research questions and problems.
in Research / Science / Software on Phenocam, Research, Science, Software
Recently I started experimenting with deep learning, a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers. The reason for this excursion in a more exotic classifier framework is the tricky issue of snow.
Snow on evergreen canopies artificially decreases the greenness value of an image, corrupting an otherwise rather smooth PhenoCam time series of greenness. Below you see a split image of a normal snow free canopy, and a snow covered canopy, visually showing this decrease in greenness.
These snowy days result in the dips in Gcc (greenness) as seen in the time series of image greenness below.
Within the lab we tried various techniques to spot snow on these canopies. These techniques were mostly colour metrics, based upon the distances in colour space (from either white or grey) of a particular pixel or region of interest. However, all efforts failed or were not generalizable across all sites, meaning that every site would need to be parameterized independently (which is a processing headache).
In an effort to address this classification problem I installed the SegNet variety of the Caffe Deep Learning Framework (hence the blog post title). The SegNet framework allows for pixel based classification based upon a deep learning approach, originally designed to quickly (matter of millisecons) classify street images to assist autonomous vehicles. However, I hope this approach might help solve the issue of classifying these snowy days, recognizing snowy canopies instead of pedestrians. Results will follow in the coming weeks.
in Jungle rhythms / Research / Science on Dr congo, Jungle rhythms, Phenology, Research, Science
A lot of the Jungle Rhythms project is shrouded in secrets. Some of them more elusive then others. One of these secrets is the reason for the dashed appearance of some of the life cycle events in the hand written notes, instead of the usual full lines or cross hatched lines.
[caption id=”” align=”aligncenter” width=”700”] Dotted lines in a Jungle Rhythms yearly section[/caption]
These dashed patterns are very difficult to transcribe. The question therefore remains, should it be transcribed to begin with? From a data retention point of view the answer is simple, yes. Any data which is not marked, as written in the original, is data lost. However, this might not be a convincing argument for most citizen scientists, as they are a nuisance to mark, nor does it explain the underlying nature of this signal.
Browsing through some of the data I found evidence for mixed use of dashed / alternating patterns and full lines (marking a continuous process, see right side of the example above). This suggest that the alternating pattern is true, and not a different style of marking continuous life cycle events.
[caption id=”” align=”aligncenter” width=”640”] Tree displaying multiple life cycles[/caption]
Thinking about the dotted line problem, I realized that this might have been the way to mark the occurrence of multiple asynchronous growth phases on the same tree. Partial blooming, fruit and leaf development are common among tropical species and obviously hard to classify as a continuous and discrete process, the tree doesn’t behave as one. A picture I took near the Congo river in Yangambi shows a tree displaying three different leaf development stages is shown above. Here, the dull green leaves are the old ones, the bright green leaves are new ones, while the yellow / red ‘leaves’ are either very young leaves or fruit
Until I find the protocol used in creating these tables I will not know for sure. However, cross referencing some species with known life cycle behaviour in existing databases could confirm that dotted lines in the markings are those with partial blooming / leaf out. My search for answers continues.