Deep Learning For Computer Vision Jason Brownlee
Introduction
- The objective of this post is to write a summary of the book “Deep Learning for Computer Vision” from Jason Brownlee. I write this kind of post with the end in mind memorising my own experience about this book and helps me in the future when I will be reading it again what were the key concepts and ideas which made me reactive. And as a supplementary benefit it may also help the anonymous reader to have a point of view about the content of this excellent book.
Preamble
- Fit: adapting the model weights in response to a training dataset.
- #conda install -c anaconda graphviz
- The type of loss function is depending of the type of prediction modelling problem:
- Modelling. Loss Function
- Regression Mean Squared Error
- Binary Classification Binary Cross Entropy
- Multi class Classification Categorical Cross Entropy
- #conda install -c anaconda pillow
- Gist.github.com: facilitate the publishing of code in a blog
- Chapter 5: How to manually scale image pixel data
- Function to scale an image: thumbnail
- Remove linear correlation from pixel data: PCA and ZCA. ZCA is preferable for CNN because it’s local.
- Chapter 6: How to load and manipulate image with Keras.
- Chapter 7: How to scale image pixel data with Keras
- Normalisation: scale pixel values to the range 0-1
- Centering: substracting from the mean
- Standardisation: mean of 0 and standard deviation of 1
- Chapter 8: How to load large datasets from directories with Keras
- Chapter 9: How to use image data augmentation in Keras
- #expand_dims(data,0) : adds a dimension to a table
- The dimensions of a single image can be expanded from [rows][cols][channels] to [samples][rows][cols][channels], where the numbers of samples is one, for the single image. This transforms the array of the image into an array of samples with one image
- samples = expand_dims(image,0)
- See also numpy.moveaxis and numpy.reshape.
- Chapter 10: How to use colour ordering formats
- Chapter 11: How Convolutional layers work
- “Using a filter smaller than the input is intentional as it allows the same filter (set of weights) to be multiplied by the input array multiple times at different locations on the input. This systematic application of the same filter across an image is a powerful idea.”
- “The innovation of using the convolution operation in a neural network is that the values of the filter are weights to be learned during the training of the network. The network will learn what types of features to extract from the input. Specifically, training under stochastic gradient descent, the network is forced to learn to extract features from the image that minimize the loss for the specific task the network is being trained to solve, e.g. extract features that are the most useful for classifying images as dogs or cats. In this context, you can see that this is a powerful idea.“
- Chapter 12: How to use filter size, padding and stride
- Chapter 13: How pooling layers work
- ReLU: Rectified Linear Activation Layer: a best practice.
- “Pooling can be used to downsample the detection of features in features maps.”
- “Max pooling highlights the most present feature in the patch and works better in practice than average pooling for computer vision tasks like image classification”
- Chapter 14: ImageNet, ILSVRC and milestones architectures
- ILSVRC: ImageNet large Scale Visual Recognition Challenge is an annual competition at the intersection of computer vision and deep learning.
- Chapter 15: How milestone model architectural innovation work
- The different winning architectural innovation models:
- LeNet-5 (Yann LeCun)
- AlexNet (University of Toronto)
- VGG (Visual geometry Group)
- Inception (GoogleLeNet)
- ResNet (Microsoft Research)
- Chapter 16: How to use 1x1 convolution to manage complexity
- Chapter 17: How to implement model architecture innovation
- Chapter 18: How to use pre-trained models and transfer learning
- “Deep convolutional neural network models may take days or even weeks to train on very large datasets. A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks.”
- “Transfer learning generally refers to a process where a model trained on one problem is used in some way on a second, related problem.”
- “Recall that convolutional layers closer to the input layer of the model learn low-level features such as lines, that layers in the middle of the layer learn complex abstract features that combine the lower level features extracted from the input, and layers closer to the output interpret the extracted features in the context of a classification task.”
- Load the VGG-16 pre-trained model
- Load the inception V3 pre-trained model
- Chapter 19: How to classify black and white photos of clothing
- Baseline model:
- Load dataset
- Prepare pixels
- Define model
- Evaluate model
- Plot diagnostic learning curve
- Summarise model performance
- Run test harness
- “Some slight overfitting: this could be addressed with use of regularisation or the trading for fewer epochs.”
- Address the overfitting: dropout regularisation and data augmentation
- Dropout helps reduce overfitting by preventing a layer from seeing twice the exact term pattern. Dropout and Data augmentation tend to disrupt random correlation occurring in your data.
- Data augmentation: horizontal flips, vertical flips, rotation, zooms. “Data augmentation involves making copies of the examples in the training dataset with small random modifications.”
- “Fitting the model will require that the number of training epochs and batch size to be specified. We will use a generic 100 training epochs for now and a modest batch size of 64. It is better to use a separate validation dataset, e.g. by splitting the train dataset into train and validation sets.”
- The CIFAR 10 on my iMac gave the following results:
- Chapter 22: How to label satellite photographs and the Amazon rain forest
- VGG: Vision Geometry Group. VGG-16 is a small and well-understood model. Karen Simonyan and Andrew Zisserman.
- Cross-entropy loss=log loss; log loss penalises the prediction that are confident and wrong.
- Given that we expect the rate of learning to be slowed, we give the model more time to learn by increasing the number of training epochs.
- Keras provides a range of pre-trained models that can be loaded and used wholly or partially via the Keras application API.
- Adam: adaptive learning rate
- Chapter 23: Deep learning for object recognition
- “Image classification involves assigning a class label to an image, whereas object localization involves drawing a bounding box around one or more objects in an image. Object detection is more challenging and combines these two tasks and draws a bounding box around each object of interest in the image and assigns them a class label.”
- You Only Look Once = YOLO.
- “YOLO family of models are fast, much faster than R-CNN, achieving object detection in real-time.“
- “ The best-of-breed open source library implementation of the YOLOv3 for the Keras deep learning library.”
- Region-Based Convolutional Neural Networks, or R-CNNs
- Image classification
- Image localisation
- Object detection
- Chapter 25: How to perform object detection with Mask R-CNN
- “An extension of object detection involves marking the specific pixels in the image that belong to each detected object instead of using coarse bounding boxes during object localization. This harder version of the problem is generally referred to as object segmentation or semantic segmentation.”
- Chapter 26: How to develop a new object detection model
- Python provides the ElementTree API that can be used to load and parse an XML file and we can use the find() and findall() functions to perform the XPath queries on a loaded document.
- Très Bon example d’extraction d’information d’un fichier xml.
- The average or mean of the average precision (AP) across all of the images in a dataset is called the mean average precision, or mAP.
- Chapter 27: Deep Learning for face recognition
- “Deep learning methods are able to leverage very large datasets of faces and learn rich and compact representations of faces, allowing modern models to first perform as-well and later to outperform the face recognition capabilities of humans.”
- “The DeepID systems were among the first deep learning models to achieve better-than-human performance on the task, e.g. DeepID2 achieved 99.15% on the Labeled Faces in the Wild (LFW) dataset, which is better-than-human performance of 97.53%.”
- Video: “How to run Tensor Flow lite on raspberry for object detection” @EdjeElectronics.
- Chapter 28: How to detect faces in photograph
- Download a pre trained model for frontal face detection from the OpenCV github. OpenCV : Open Source Computer Vision Library.
- #conda install -c mempo opencv
- #opencv version: 3.4.2
- MTCNN: Multi-task Cascaded Convolutional Neural Network.
- I ran MTCNN on my own photograph of friends. I was able to get out the seven faces of friends out of the photos with the sample code from Jason Brownlee. Excellent :=) See the post.
- #conda install -c condo-forge mtcnn
- Chapter 29: How to perform face classification with FaceNet
- Face recognition: we will use MTCNN model, then we will develop a Linear Support Vector Machine (SVM) classifier machine to predict the identity of a given face.
- A face embedding is a vector that represents the features extracted from the face. This can then be compared with the vectors generated for other faces. The vectors are often compared to each other using a distance metric.
- It is common to use a Linear Support Vector Machine (SVM) when working with normalised face embedding inputs. This is because the method is very effective at separating the face embedding vectors.
- FaceNet: a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embedding as feature vectors.
- Chapter 30: How to perform face classification with FaceNet
- “FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets.“
- “FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.”
- FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.
- “The model is a deep convolutional neural network trained via a triplet loss function that encourages vectors for the same identity to become more similar (smaller distance), whereas vectors for different identities are expected to become less similar (larger distance). The focus on training a model to create embeddings directly (rather than extracting them from an intermediate layer of a model) was an important innovation in this work.“
Conclusion
- This book is very pragmatic in the sense that it entices you to use the provided code on your own photos and then you discover the power of Convolutional network in the field of image classification and face recognition.
1 commentaire:
Great review!
Enregistrer un commentaire