Past week I started to play with the Caffe deep learning framework. Although I initially planned on using the SegNet branch of the Caffe framework to classify snow in PhenoCam images. However, given that it concerns a rather binary classification I don’t need to segment the picture (I do not care where the snow in the image is, only if it is present). As such, a more semantic approach could be used.
Luckily people at MIT had already trained a classifier, the Places-CNN, which deals with exactly this problem, characterizing an image scene. So, instead of training my own classifier I gave theirs a try. Depending on the image type, and mostly the view angle the results are very encouraging (even with their stock model).
For example, the below image got classified as: mountain snowy, ski slope, snowfield, valley, ski_resort. This all seems very reasonable indeed. Classifying a year worth of images at this site yielded an accuracy of 89% (compared to human observations).
However, when the vantage point changes so does the accuracy of the classification, mainly due to the lack of images of this sort in the original training data set I presume. The image below was classified as: rainforest, tree farm, snowy mountain, mountain, cultivated field. As expected, the classification accuracy dropped to a mere 13%. There is still room for improvement using PhenoCam based training data. But, building upon the work by the group at MIT should make these improvements easier.