europe: Deep Learning For Computer Vision Jason Brownlee

Introduction

The objective of this post is to write a summary of the book “Deep Learning for Computer Vision” from Jason Brownlee. I write this kind of post with the end in mind memorising my own experience about this book and helps me in the future when I will be reading it again what were the key concepts and ideas which made me reactive. And as a supplementary benefit it may also help the anonymous reader to have a point of view about the content of this excellent book.

Preamble

Fit: adapting the model weights in response to a training dataset.
#conda install -c anaconda graphviz
The type of loss function is depending of the type of prediction modelling problem:

Modelling. Loss Function

Regression Mean Squared Error
Binary Classification Binary Cross Entropy
Multi class Classification Categorical Cross Entropy

#conda install -c anaconda pillow
Gist.github.com: facilitate the publishing of code in a blog

Chapter 5: How to manually scale image pixel data

Function to scale an image: thumbnail
Remove linear correlation from pixel data: PCA and ZCA. ZCA is preferable for CNN because it’s local.

Chapter 6: How to load and manipulate image with Keras.

Chapter 7: How to scale image pixel data with Keras

Normalisation: scale pixel values to the range 0-1
Centering: substracting from the mean
Standardisation: mean of 0 and standard deviation of 1

Chapter 8: How to load large datasets from directories with Keras

Chapter 9: How to use image data augmentation in Keras

#expand_dims(data,0) : adds a dimension to a table
The dimensions of a single image can be expanded from [rows][cols][channels] to [samples][rows][cols][channels], where the numbers of samples is one, for the single image. This transforms the array of the image into an array of samples with one image

samples = expand_dims(image,0)
See also numpy.moveaxis and numpy.reshape.

Chapter 10: How to use colour ordering formats

Chapter 11: How Convolutional layers work

“Using a filter smaller than the input is intentional as it allows the same filter (set of weights) to be multiplied by the input array multiple times at different locations on the input. This systematic application of the same filter across an image is a powerful idea.”
“The innovation of using the convolution operation in a neural network is that the values of the ﬁlter are weights to be learned during the training of the network. The network will learn what types of features to extract from the input. Speciﬁcally, training under stochastic gradient descent, the network is forced to learn to extract features from the image that minimize the loss for the speciﬁc task the network is being trained to solve, e.g. extract features that are the most useful for classifying images as dogs or cats. In this context, you can see that this is a powerful idea.“

Chapter 12: How to use filter size, padding and stride

Chapter 13: How pooling layers work

ReLU: Rectified Linear Activation Layer: a best practice.
“Pooling can be used to downsample the detection of features in features maps.”
“Max pooling highlights the most present feature in the patch and works better in practice than average pooling for computer vision tasks like image classification”

Chapter 14: ImageNet, ILSVRC and milestones architectures

ILSVRC: ImageNet large Scale Visual Recognition Challenge is an annual competition at the intersection of computer vision and deep learning.

Chapter 15: How milestone model architectural innovation work

The different winning architectural innovation models:

LeNet-5 (Yann LeCun)
AlexNet (University of Toronto)
VGG (Visual geometry Group)
Inception (GoogleLeNet)
ResNet (Microsoft Research)

Chapter 16: How to use 1x1 convolution to manage complexity

Chapter 17: How to implement model architecture innovation

Chapter 18: How to use pre-trained models and transfer learning

“Deep convolutional neural network models may take days or even weeks to train on very large datasets. A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks.”
“Transfer learning generally refers to a process where a model trained on one problem is used in some way on a second, related problem.”
“Recall that convolutional layers closer to the input layer of the model learn low-level features such as lines, that layers in the middle of the layer learn complex abstract features that combine the lower level features extracted from the input, and layers closer to the output interpret the extracted features in the context of a classiﬁcation task.”
Load the VGG-16 pre-trained model
Load the inception V3 pre-trained model

Chapter 19: How to classify black and white photos of clothing

Baseline model:

Load dataset
Prepare pixels
Define model
Evaluate model
Plot diagnostic learning curve
Summarise model performance
Run test harness

“Some slight overfitting: this could be addressed with use of regularisation or the trading for fewer epochs.”

Chapter 20: CIFAR 10

Address the overfitting: dropout regularisation and data augmentation

Dropout helps reduce overfitting by preventing a layer from seeing twice the exact term pattern. Dropout and Data augmentation tend to disrupt random correlation occurring in your data.
Data augmentation: horizontal flips, vertical flips, rotation, zooms. “Data augmentation involves making copies of the examples in the training dataset with small random modiﬁcations.”

“Fitting the model will require that the number of training epochs and batch size to be speciﬁed. We will use a generic 100 training epochs for now and a modest batch size of 64. It is better to use a separate validation dataset, e.g. by splitting the train dataset into train and validation sets.”
The CIFAR 10 on my iMac gave the following results:

Chapter 22: How to label satellite photographs and the Amazon rain forest

VGG: Vision Geometry Group. VGG-16 is a small and well-understood model. Karen Simonyan and Andrew Zisserman.
Cross-entropy loss=log loss; log loss penalises the prediction that are confident and wrong.
Given that we expect the rate of learning to be slowed, we give the model more time to learn by increasing the number of training epochs.
Keras provides a range of pre-trained models that can be loaded and used wholly or partially via the Keras application API.
Adam: adaptive learning rate

Chapter 23: Deep learning for object recognition

“Image classiﬁcation involves assigning a class label to an image, whereas object localization involves drawing a bounding box around one or more objects in an image. Object detection is more challenging and combines these two tasks and draws a bounding box around each object of interest in the image and assigns them a class label.”
You Only Look Once = YOLO.

“YOLO family of models are fast, much faster than R-CNN, achieving object detection in real-time.“
“ The best-of-breed open source library implementation of the YOLOv3 for the Keras deep learning library.”

Region-Based Convolutional Neural Networks, or R-CNNs
Image classification
Image localisation
Object detection

Chapter 25: How to perform object detection with Mask R-CNN

“An extension of object detection involves marking the speciﬁc pixels in the image that belong to each detected object instead of using coarse bounding boxes during object localization. This harder version of the problem is generally referred to as object segmentation or semantic segmentation.”

Chapter 26: How to develop a new object detection model

Python provides the ElementTree API that can be used to load and parse an XML ﬁle and we can use the find() and findall() functions to perform the XPath queries on a loaded document.
Très Bon example d’extraction d’information d’un fichier xml.
The average or mean of the average precision (AP) across all of the images in a dataset is called the mean average precision, or mAP.

Chapter 27: Deep Learning for face recognition

“Deep learning methods are able to leverage very large datasets of faces and learn rich and compact representations of faces, allowing modern models to ﬁrst perform as-well and later to outperform the face recognition capabilities of humans.”
“The DeepID systems were among the ﬁrst deep learning models to achieve better-than-human performance on the task, e.g. DeepID2 achieved 99.15% on the Labeled Faces in the Wild (LFW) dataset, which is better-than-human performance of 97.53%.”
Video: “How to run Tensor Flow lite on raspberry for object detection” @EdjeElectronics.

Chapter 28: How to detect faces in photograph

Download a pre trained model for frontal face detection from the OpenCV github. OpenCV : Open Source Computer Vision Library.
#conda install -c mempo opencv
#opencv version: 3.4.2
MTCNN: Multi-task Cascaded Convolutional Neural Network.

I ran MTCNN on my own photograph of friends. I was able to get out the seven faces of friends out of the photos with the sample code from Jason Brownlee. Excellent :=) See the post.

#conda install -c condo-forge mtcnn

Chapter 29: How to perform face classification with FaceNet

Face recognition: we will use MTCNN model, then we will develop a Linear Support Vector Machine (SVM) classifier machine to predict the identity of a given face.
A face embedding is a vector that represents the features extracted from the face. This can then be compared with the vectors generated for other faces. The vectors are often compared to each other using a distance metric.
It is common to use a Linear Support Vector Machine (SVM) when working with normalised face embedding inputs. This is because the method is very effective at separating the face embedding vectors.
FaceNet: a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embedding as feature vectors.

Chapter 30: How to perform face classification with FaceNet

“FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets.“
“FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.”

FaceNet: A Uniﬁed Embedding for Face Recognition and Clustering, 2015.

“The model is a deep convolutional neural network trained via a triplet loss function that encourages vectors for the same identity to become more similar (smaller distance), whereas vectors for diﬀerent identities are expected to become less similar (larger distance). The focus on training a model to create embeddings directly (rather than extracting them from an intermediate layer of a model) was an important innovation in this work.“

Conclusion

This book is very pragmatic in the sense that it entices you to use the provided code on your own photos and then you discover the power of Convolutional network in the field of image classification and face recognition.

europe

Libellés

mardi, juillet 07, 2020

Deep Learning For Computer Vision Jason Brownlee

Introduction

Preamble

Chapter 5: How to manually scale image pixel data

Chapter 6: How to load and manipulate image with Keras.

Chapter 7: How to scale image pixel data with Keras

Chapter 8: How to load large datasets from directories with Keras

Chapter 9: How to use image data augmentation in Keras

Chapter 10: How to use colour ordering formats

Chapter 11: How Convolutional layers work

Chapter 12: How to use filter size, padding and stride

Chapter 13: How pooling layers work

Chapter 14: ImageNet, ILSVRC and milestones architectures

Chapter 15: How milestone model architectural innovation work

Chapter 16: How to use 1x1 convolution to manage complexity

Chapter 17: How to implement model architecture innovation

Chapter 18: How to use pre-trained models and transfer learning

Chapter 19: How to classify black and white photos of clothing

Chapter 20: CIFAR 10

Chapter 22: How to label satellite photographs and the Amazon rain forest

Chapter 23: Deep learning for object recognition

Chapter 25: How to perform object detection with Mask R-CNN

Chapter 26: How to develop a new object detection model

Chapter 27: Deep Learning for face recognition

Chapter 28: How to detect faces in photograph

Chapter 29: How to perform face classification with FaceNet

Chapter 30: How to perform face classification with FaceNet

Conclusion

1 commentaire: