Unsupervised pre-training

My understanding is that you do the unsupervised pre-processing layer-by-layer. i.e. you first train your lower layer (with no other layers present). Then you fix your lower layer's weights and then train the next layer up. etc. (I'm not sure if you can do this unsupervised pre-training with cuda-convnet) If you were training on faces, then you'd get a lower layer which recognises features like this:

The second layer up would find features like this:

And the third layer would find features (concepts?) like this:

Spooky, eh?!

(I think you’d get very similar layer preferences if you did only supervised training; but I understand that unsupervised training can speed up the training process; and allows you to exploit the huge number of unlabelled images of plants you could gather from the internet).

During unsupervised pre-training, the lower layers aren't really looking for classes, they're looking for features. Here's an interesting animation showing how neurons in the lower layer come to form their preferences during training (they start with random preferences and then refine their preferences):

(taken from here)

I think the more conventional route is to:

Do your unsupervised pre-training. The initial network parameters are set randomly. Train layer-by-layer from the bottom upwards.
Now "fine tune" the network using supervised training using the network parameters you learned during pre-training act as your initial parameters for your supervised training. You don't need to manually inspect the network. Instead you just train the entire net (using back propagation) to map from each training image to the supplied label.

If you don't use unsupervised pre-training then you just use random initial network parameters for your supervised training. In other words, unsupervised pre-training is just a way to "prime" your network's initial parameters to reasonable values before you dive into supervised training.

Things to tweak during your project

I'm not sure you'll need to modify the learning algorithm (back prop) but you probably will need to modify the network architecture and you will almost certainly need to massage the training set used for supervised learning so that you have roughly equal numbers of training examples of all the subclasses that you're interested in.

Zeiler 2013’s paper on “Visualising and Understanding Convolutional Neural Networks” might be of interest in trying to tune your network architecture.

And: "Visualizing Higher-Layer Features of a Deep Network" by Erhan et al 2012

Pre-processing training images

Deep neural nets like huge training datasets (both for unsupervised pre-training and for supervised fine-tuning). You can artificially expand your training dataset by duplicating each training image and applying simple alterations to the image. Alterations like flipping the image on its vertical axis (don't flip on the horizontal axis because gravity is really important), altering the exposure, zooming in a little, slightly rotating the image, adding a little noise, adding a blurred finger-over-the-lens in the corner of the image etc. In this way, you can effectively produce an infinitely large training dataset. Which makes deep neural nets happy!

It would be an excellent idea to write a script to apply different alterations to your training images. I believe that Python has some excellent image manipulation libraries.

In terms of which alterations to use: I think the aim is to use physically plausible alterations. i.e. alterations which result in images you might get from your users.

I'm not aware of any studies which methodically explore different image alterations during training (that would be a very interesting contribution!) But I haven't looked very hard!

One other random thought: if you had access to a photorealistic 3D rendering of various plants then you could use this to generate an unlimited number of extra training examples! (But don't spend time on this unless you know of an off-the-shelf photorealistic 3D model of lots of plants!).

Pre-processing test images (i.e. images taken by your users)

My understanding is that, if you use deep learning, then you wouldn't have to do much (if any) pre-processing. The input layer of your net is literally the raw pixel values. Instead the key will be to provide a training dataset which trains the net on a wide variety of scales and orientations for each category. You can use tricks like processing the training images to alter the scale, flip, rotate them etc so your net gets trained to recognise, say, an "oak tree leaf" no matter what scale or orientation the leaf is. I could be wrong though!

One bit of "pre-processing" that you may want to explore is automatic segmentation of the images into leaves / trunk / seeds etc. But I have a hunch that won't be necessary: I think you might "simply" be able to train a single net to map all "oak" images (whether it's an image of the whole tree, or of an acorn, or of a leaf) to "oak".

Recommendation for overall strategy

I would recommend that, at first, you forget about unsupervised pre-training. Folks have gotten excellent performance without it. First, try to get something working with pure supervised training (e.g. using labelled images from PlantCLEF and ImageNet thrown at cuda-convnet). I'd suggest using the network parameters described in Krizhevsky, Sutskever & Hinton 2012 (who won the 2012 ImageNet "Large Scale Visual Recognition Challenge" using pure supervised training). It would also be worth reading about the 2013 ImageNet competition winners (winners have been announced but details of the algorithms won’t be released until the LSVRC2013 workshop on the 7th Dec). Duplicate and modify your training images to increase the size of your training set (Krizhevsky at al used tricks like this).

Then, once that works, you have lots of options for improving your performance. Some of which are:

Experiment with unsupervised pre-training. This has the advantage that you can exploit the huge number of unlabelled images from the internet. But some people would say that you don't need unsupervised pre-training; instead you just train supervised for a long time.
Experiment with drop out (more info: Hinton et al 2012; or this fascinating 1-hour GoogleTechTalk video of Hinton talking about dropout in 2012, and toying with ideas about what dropout might tell us about the real brain). Dropout has been implemented in this fork of cuda-convnet.
Try to include information about geographical location and season into your network (I'm not sure how best to do this although I put a few ideas on the spec in the section labelled "the classification engine")
If you're feeling really ambitious then train a network on phylogenetic relationships and then transfer that knowledge to your image categorisation network ;) This is getting seriously researchy. You could write a good paper on this!

Relevant services and machines supported by CSG

Information provided by the ever-helpful Lloyd ;)

For general info, please see: http://www.doc.ic.ac.uk/csg/services/projects

Machines with high performance GPU

See the “Access to a fast GPU” section of the main spec.

Allowing access to an arbitrary port

“If you need to run a service that is accessible outside of the DoC network, ports 55000 to 56000 are open on DoC lab computers for this purpose. “

Android SDK

“The Android SDK is not installed by default on DoC machines.

A group project might install it under /vol/project/2013/<course-id>/<group-id>

The Android SDK is problematic because it requires a very sloppy udev

rule that makes attached Android devices world-readable and writable.

http://developer.android.com/tools/device.html

'UBSYSTEM=="usb", ATTR{idVendor}=="0bb4", MODE="0666", GROUP="plugdev"'

This is fine for a bedroom-programming model but not for a multi-user

environment like the lab. Caveat Programmer!”

Apple XCode (for programming iOS)

“The most-modern Macintoshes (CIDER04 upwards, I think) in Huxley 210

have the initial XCode set-up but individual users will have to apply

for an Apple Developer account through CSG if they do not already have

one. Contact CSG <help@doc.ic.ac.uk> if you need iOS developer access for your academic work. Apple limit the number of iOS sign-ups that we can do in a year.”

If you want to use XCode then it’s best to apply for a developer account AS EARLY AS POSSIBLE because it can take a while.

Timing of the PlantCLEF competition in relation to the MSc project timetable

The timing could work very well. The PlantCLEF training data is due to be released in Jan, the test data is due to be released in March and the deadline for submitting results is May (which is when DoC require you to submit your final report and do your presentation).

The PlantCLEF schedule is on this page (scroll down to the bottom)

The MSc project schedule is on this page.

Updates: 5th Dec 2013

A few quick updates:

Just to let you know: it looks like I will be away for the last week of January. This isn't ideal timing because it's when you guys will be working on your first report but I'm sure that if we get things moving in the right direction before I leave then you'll be fine; and of course you'll have Will!

And now some good news...

A new version of Theano (0.6) has just been released.

And...

Mark Zuckerberg will be talking at the Deep Learning Workshop at NIPS2013. This is actually pretty remarkable news. It shows just how seriously some of the big companies are taking deep learning. Over the past few months, Facebook, Amazon and Google have all had announcements about deep learning. It's hot stuff!

And a new, fast convnet tool called Caffe ... Faster than cuda-convnet on modern GPUs apparently : https://plus.google.com/103174629363045094445/posts/HJVcmVfYgtR

Update 7th Dec 2013

Some interesting news from Mark Communs at NIPS about localising objects in images using deep neural networks; and some early information about how the 2013 ImageNet winner improved performance relative to the 2012 winner.

Update 10th Dec 2013

From the new yorker: What Facebook Wants with Artificial Intelligence. FB have just hired Yann LeCunn; deep learning hero.

Alternative image classification tasks:

classifying microscope images of blood cells as either "healthy" or "malaria-infected"

(A research group at UCLA recently built a "gamified" solution to this problem:)

This probably wouldn’t be a smart phone app. But it would be very worth while and should get a lot of attention if you get it to work ;)

Smart phone app to recognise animals

This would basically be identical to the plant recognition project except for animals. I haven’t found a competition like PlantCLEF but there is a “recognise dogs vs cats” kaggle competition (but it ends on 1st Feb 2014).

Update 12th Dec 2013

Interesting new paper on object detection and segmentation: Erhan et al, December 2013, Scalable Object Detection using Deep Neural Networks.

And a fascinating recent blog post on from Piotr Dollár on “A Seismic Shift in Object Detection”, also talking about modern approaches to segmentation.