- The MNIST dataset will figure out the case of capturing written numbers.
- The CIFAR-10 dataset address the case of recognizing objects in photos.
There are plenty of articles for understanding CNN, here it's simply my understanding through one example.
An image is a story of pixels
The decomposition of a numeral picture into a network of neurons can be represented like this:
In the above image, the numeral "8" is an image of 28 * 28 pixels. This equals to to 784 input features for the network. The simplest way to recognize the numeral is to use a simple Multilayer Perceptron neural network without convolution. Even with a simple MLP, you can achieve less than 2% error rate.
Convolution
Understanding Convolutional Neural Network (CNN) is a challenge, but working with the MNIST dataset as an example helped me. Either in KERAS or with MXNET (from the Apache framework) you have code examples.
Convolution is defined as a mathematical operation describing a rule for how to merge two sets of information. Does it help you? Me not really :=(
In the image processing with CNN, it's like photography when you are applying different filters to the same picture, you get different results. In CNN, you apply different filters and then you pool them together to produce your classification.
In the above image the filter helps the processing at emphasizing part of the image. For example the first filter is spotting the upper part of the image by putting a specific weight on the first row of the matrix. There are other ways to enhance the performance with "image augmentation" techniques, as for example using the Keras Image Image Augmentation API. You can implement a concrete example with the book of Jason Brownlee.
With a CNN on a pre-trained model you get a very good accuracy:
Of the importance of pre-training the model
If you compare the two images above, you will remark that two types of network have been experimented: Resnet20 and Resnet56. The CNN have been running in the first case without pre-training and and in the second case with pre-training data. The conclusion is that you get a much better accuracy with pre-training. The importance of pre-training was demonstrated in this paper. You have an explanation about pre-training. Pre-training quickly defined: you compute the weights of your CNN network on a first set of data, and then you "transfer" this knowledge to the new set of data, that means without starting from scratch for the second set of data.
During that day we worked on a CNN with "transfer learning": that means that the CNN model was trained on different but correlated problems. We have been using Amazon Sage Maker built-in algorithms to train our model incrementally.
Once the model was trained, we prepare the model for inference using an Amazon SageMaker endpoint.
Training the model
In the above image you will notice that the autograd (MXNET) function is used.
Two paradigms: imperative programming vs declarative programming
- Numpy is one library used in Python which is a type of imperative programming.
- Tensorflow is of type declarative programming ("compile" key word)
This distinction is important for the construction of your neuronal network: in the first stage you declare the different layers, and in a second phase you compile after adding them all.
Inference and deployment
All the screenshots of this post were extracted during a one day training at AWS, and the models were executed and deployed on AWS EC2 instance, using Jupiter Notebook and Amazon Elastic Container Service.
Every application is now deployed in containers, the most important artefact being the Container Registry which is operated with Amazon SageMaker. You only pay (6cents to $27/hour for a 64 cores) for execution when you are using Amazon Lambda. You upload your static website to Amazon S3 (Amazon Simple Storage Service) containing your model.
With Amazon lambda you don't have to manage infrastructure, you are managing services. It's server less and provides a fully managed highly available services.
Amazon SageMaker Ground truth is a capability that helps you because it reduces the time of preparation of the data.
Final result
The final result is a static website in which you can upload your picture and the CNN will detect for you the objects in the image.
The dog was the first example as a candidate for classification. Then I added the photo of the cat, taken out from the internet. For both the dog and the cat, you can see that the objects are well identified. Then I uploaded an image of my own photos library, a lady bird on leaf, and here you remark that the classification is far from being satisfying.
My conclusion is that images from the internet are already known and trained, but when it comes to the discovery of new images, the CNN has more difficulty to find out. Hence the need for still pre training a CNN network. And the conclusion of my conclusion, keep your private images safe.
Aucun commentaire:
Enregistrer un commentaire