europe: Generative Adversarial Networks with Python (Part I and Part II)

Introduction

GANs are very promising and it is the reason why I bought this book of Jason Brownlee.
It's the seventh book of Jason Brownlee that I am reading and practicing.
The way Jason Brownlee explains the concepts and the fact that code examples are provided are key elements for buying such books.
In the book you will find the concepts explained from different point of view in in different chapters, or rephrased so that in the end you have a good chance to remember all these concepts.

Part 1: Foundations

Chapter 1: What are Generative Adversarial Networks

Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities of patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.
GANs are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models: the generator model that we train to generate new examples, and the discriminator model that tries to classify examples as either real (from the domain) or fake (generated). The two models are trained together in an adversarial zero-sum game until the discriminator model is fooled about half of the time, meaning the generator model is generating plausible examples.
GANs are exiting in image-to-image translation tasks such as translating photos of summer to winter or day or night, and in generating photorealistic photos of objects, scenes, and people that even humans cannot tell are fake. Example below is impressive:

Example of Progression in the capability of GAN. From the Malicious use of AI.

Supervised: because there is a real expected outcome to which a prediction is compared. Examples of supervised learning problems include classification and regression, and examples of supervised learning algorithms include logistic regression and random forests.
The second main type of machine learning is the descriptive or unsupervised learning approach. Here we are only given inputs, and the goal is to find "interesting patterns" in the data. This is a much less well-defined problem, since we are not told what kind of patterns to look for, and there is no obvious error metric to use (unlike supervised learning, where we can compare our prediction of y for a given x to the observed value).

Page 2, Machine Learning: A Probabilistic Perspective, 2012.

Examples of unsupervised learning algorithms are K-means and Generative Adversarial Networks.
Classification is also traditionally referred to as discriminative modeling.
Alternately, unsupervised models that summarize the distribution of input variables may be able to be used to create or generate new examples in the input distribution. As such, these types of models are referred to as generative models.
In fact, a really good generative model may be able to generate new examples that are not just plausible, but indistinguishable from real examples from the problem domain.
Examples of generative models:

Naive Bayes
Latent Dirichlet Allocation (LDA)
Gaussian Mixture Model (GMM)
Restricted Boltzmann Machine (RBM)
Deep Belief network (DBN)
Variational Autoencoder (VAE)
Generative Adversarial Network (GAN)

The GAN model architecture involves two sub-models: a generator model for generating new examples and a discriminator model for classifying whether generated examples are real (from the domain) or fake (generated by the generator model).
The two models, the generator and discriminator, are trained together.
Successful generative modeling provides an alternative and potentially more domain-specific approach for data augmentation. In fact, data augmentation is a simplified version of generative modeling, although it is rarely described this way.

Chapter 2: How to Develop Deep Learning Models With Keras

Activation functions that transform a summed signal from each neuron in a layer can be added to the Sequential as a layer-like object called the Activation class.
The most common optimization algorithm is stochastic gradient descent, or sgd.
Once the network is compiled, it can be fit, which means adapting the model weights in response to a train dataset.
The network is trained using the back propagation algorithm and optimized according to the optimization algorithm and loss function specified when compiling the model.
Once fit, a history object is returned that provides a summary of the performance of the model during training. This includes both the loss and any additional metrics specified when compiling the model, recorded each epoch.
Keras functional. models:

The sequential API: allows you to to create models layer-by-layer for most problems. It is limited in that it does not allow you to create models that share layers or have multiple input or output layers.
The functional API: is an alternate way of creating models that offers a lot more flexibility, including creating more complex models.

When input data is one-dimensional (rows of samples), such as Mulitlayer Perceptron, the shape must explicitly leave room for the shape of the mini batch size used when splitting the data when training the network. Therefore the shape tuple is always defined with a hanging last dimension (2,).

Chapter 3: How to Upsample with Convolutional Neural Networks

The generator model is typically implemented using a deep convolutional neural network and results-specialized layers that learn to fill in the features in an image rather than extract features from an image.
Two common types of layers can be used in the generator model: upsample layer that simply doubles the dimensions of the input and the transpose convolutional layer that performs an inverse convolution operation.

upsampling

transposing

The transpose convolutional layer is more complex than a simple upsampling layer. A simple way to think about it is that it both performs the upsample operation and interprets the coarse input data to fill in the detail while it is upsampling. It is like a layer that combines the UpSampling2D and Conv2D layers into one layer.
In fact the transpose convolutional layer performs an inverse convolution operation. Specifically, the forward and backward passes of the convolutional layer are reversed.
A deconvnet can be thought of as convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.
GAN performance and skill is notoriously difficult to quantify.

Chapter 4: How to implement the GAN Training Algorithm

Latent variable are variables that are not directly observed but are rather inferred from other variables that are observed.
An epoch is defined as one cycle through a training dataset, where the samples in a training dataset are used to update the model weights in mini batch.
The discriminator model must make predictions for the real and fake samples and the weights the discriminator must be updated proportional to how correct or incorrect those predictions were.
Next, the generator model must be updated. Again, a batch of random points from the latent space must be selected and passed to the generator to generate fake images, and then passed to the discriminator to classify.
The discriminator is trained to correctly classify real and fake images.
"This is just the standard cross-entropy cost that is minimized when training a standard binary classifier with a sigmoid output. The only difference is that the classifier is trained on two minibatches of data; one coming from the dataset, where the label is 1 for all examples, and one coming from the generator, where the label is 0 for all examples." NIPS 2016 Tutorial: Generative Adversarial Networks, 2016

Chapter 5: How to Implement GAN Hacks to Train Stable Models

There are a number of heuristics or best practices called GAN hacks that can be used when configuring and training your GAN models. These heuristics have been hard won by practitioners testing and evaluating hundreds or thousands of combinations of configuration operations on a range of problems over many years.
GANs are difficult to train. The reason they are difficult to train is that both the generator model and the discriminator model are trained simultaneously in a game. This means that improvements to one model come at the expense of the other model. The goal of training two models involves finding a point of equilibrium between the two competing concerns.

Generator Model Architecture for the DCGAN

Best practices for Deep Convolutional GANs (DCGANs):

Downsample using Strided convolutions
Upsample using strided convolution
Use Leaky ReLU:

The Rectified linear activation unit, or ReLU for short, is a simple calculation that returns the value provided as input directly, or the value 0.0 if the input is 0.0 or less. It has become a best practice when developing deep Convolutional Neural Networks generally.

Use batch normalization:

Batch Normalization standardizes the activations from a prior layer to have a zero mean and unit variance. This has the effect of stabilizing the training process. Batch normalization is used after the activation of convolution and transpose convolutional layers in the discriminator and generator models respectively.

Use Gaussian weight initialization

Before a neural network can be trained, the model weights (parameters) must be initialized to small random variables. The best practice for DCGAN models reported is to initialize all weights using a zero-centered Gaussian distribution (the normal or bell-shaped distribution) with a standard deviation of 0.02.

Use Adam Stochastic Gradient Descent:

Stochastic gradient descent, or SGD for short, is the standard algorithm used to optimize the weights of convolutional neural network models.

Scale images to the range [-1, 1]:

It is recommended to use the hyperbolic tangent activation function as the output from the generator model. As such, it is also recommended that real images used to train the discriminator are scaled so that their pixel values are in the range [-1, 1]. This is so that the discriminator well always receive images as input, real and fake, that have pixel values in the same range.

Use a Gaussian latent space:

The latent space defines the shape and distribution of the input to the generator model used to generate new images. The DCGAN recommends sampling from a uniform distribution, meaning that the shape of the latent space is an hypercube. The more recent best practice is to sample from a standard Gaussian distribution, meaning that the shape of the latent space is a hypersphere, with a mean of zero and a standard deviation of one.

Separate batches of real and fake images:

The discriminator model is trained using stochastic gradient descent with mini-batches. The best practice is to update the discriminator with separate batches of real and fake images rather than combining real and fake images into a single batch.

Use label smoothing:

It is common to use the class label 1 to represent real images and class label 0 to represent fake images when training the discriminator model. These are called hard labels, as the label values are precise or crisp. It is a good practice to use soft labels, such as values slightly more or less than 1.0 or slightly more than 0.0 for real and fake images respectively, where the variation for each image is random. This is often referred to as label smoothing and can have a regularizing effect when training the model.

Use noisy labels:

The labels used when training the discriminator are always correct. This means that fake images are always labeled with class 0 and real images are always labeled with class 1. It is recommended to introduce some errors to these labels where some fake images are marked as real, and some real images are marked as false.

Soumith Chintals's GAN hacks

Part II: GAN Basics

Chapter 6: How to Develop a 1D GAN from Scratch

A generator model is capable of generating new artificial samples that plausibly could have come from an existing distribution of samples.
Importantly, the performance of the discriminator model is used to update both the model weights of the discriminator itself and the generator model.
The weights in the generator model are updated based on the performance of the discriminator model. When the discriminator model is good at defecting fake samples, the generator is updated more (via a larger error gradient), and when the discriminator model is relatively poor or confused when detecting fake samples, the generated model is updated less. This defines the zero-sum or adversarial relationship between those two models.
The back propagation process used to update the model weights will see this as a large error and will update the model weights (i.e. only the weights in the generator) to correct for this error, in turn making the generator better at generating plausible fake samples.
The weights in the discriminator are marked as not trainable.

My first GAN

1D GAN: blue dots are generated one

Chapter 7: How to Develop a DCGAN for Grayscale Handwritten Digits

MNIST handwritten digit dataset extract

The discriminator:

the model is trained to minimize the binary cross-entropy loss function, appropriate for binary classification. We will use some best practices in defining the discriminator model, such as the use of LeakyReLU instead of ReLU, using Dropout, and using the Adam version of stochastic gradient descent with a learning rate of 0.0002 and a momentum of 0.5.
is just a normal neural network model for binary classification
needs to be trained, this involves repeatedly retrieving samples of real images and samples of generated images and updating the model for a fixed number of iterations.

The generator:

we want many parallel versions or interpretations of the input. This is a pattern in convolutional neural networks where we have many parallel filters resulting in multiple parallel activation maps, called feature maps, with different interpretation of the input.
the next major architectural innovation involves upsampling the low-resolution image to a higher resolution version of the image.
is not compiled and does not specify a loss function or optimization algorithm. This is because the generator is not trained directly.

Training the GAN model:

a new GAN model can be defined that stacks the generator and discriminator such that the generator receives as input random points in the latent space and generates samples that are fed into the discriminator model directly, classified, and the output of this larger model can be used to update the model weights of the generator.
therefore, we will mark all of the layers in the discriminator as not trainable when it is part of the GAN model so that they cannot be updated and overtrained on fake examples.
therefore when the generator is trained as part of the GAN model, we will mark the generated samples as real (class = 1)
making the discriminator not trainable is a clever trick in the Keras API.

MNIST number generation after 100 epochs

performed on iMac during nearly 24 hours of training

Comment of the above image: When viewing the discriminator model's accuracy in concert with generated images, we can see that the accuracy on fake examples (98%) does not correlate well with the subjective quality of the image, but the accuracy for real examples (15%) may. It is crude and possibly unreliable metric of GAN performance, along with loss.
Then the code example provided in the book is a piece of code that generates 25 handwritten images:

25 GAN Generated MNIST handwritten images

I observe that most of the images are plausible.
There is also a piece of code that can produce a single digit, also very plausible:

a GAN Generated MNIST handwritten

Chapter 8: How to Develop a DCGAN for Small Color Photographs

Developing a GAN for generating images requires both a discriminator convolutional neural network model for classifying wether a given image is real or generated and a generator model that uses inverse convolutional layers to transform an input to full two-dimensional image of pixel values.
CIFAR is an acronym that stands for the Canadian Institute For Advanced Research. The dataset is comprised of 60 000 32x32 pixel color photographs of objects from 10 classes, such as frogs, birds, cats, ships, airplanes, etc...

Plot of the first 49 small objects from CIFAR-10

The discriminator model has no pooling layers and a single node in the output layer with the sigmoid activation to predict whether the input sample is real or fake. The model is trained to minimize the binary cross-entropy loss function, appropriate for binary classification.
It helps to see that the discriminator is just a normal neural network model for binary classification.
The generator model will generate images with pixel values in the range [-1, 1] as it will use Tanh activation function, a best practice.
We don't want just one low-resolution version of the image; we want many parallels versions or interpretations of the input. This is a pattern in Convolutional Neural Networks where we have many parallel filters resulting in multiple parallel activation maps, called feature maps, with different interpretations of the input.
The generator model is not compiled and does not specify a loss function or optimization algorithm.

CIFAR-10 generated untrained

100 CIFAR images generated by a GAN

GAN Generated CIFAR model for a Specific Point in the latent space

Chapter 9: How to Explore the Latent Space When Generating Faces

The generative model in the GAN architecture learns to map points in the latent space to generated images.
"Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks": the authors explored the latent space for GANs fit on a number of different training datasets, most notably a dataset of celebrity faces. They demonstrated two interesting aspects:

The first was the vector arithmetic with faces. For example, a face of a smiling woman minus the face of a neutral woman plus the face of a neutral man resulted in the face of a smiling man:

smiling woman - neutral woman + neutral man = smiling man

The second demonstration was the transition between two generated faces, specifically by creating a linear path through the latent dimension between the points that generated two faces and the generating all of the faces for the points along the path.

When working with a GAN, it is easier to model a dataset if all of the images are small and square in shape.
The pre-trained Multi-Task Cascaded Convolutional Neural Network (MTCNN) is used.
We need inputs for the generator model: these are random points from the latent space, specifically Gaussian distributed random variables.
In these cases, we have performed a linear interpolation which assumes that the latent space is a uniformly distributed hypercube. Technically, our chosen latent space is a 100-dimension hypersphere or multimodal Gaussian distribution. There is a mathematical function called the Spherical linear interpolation function, or Slerp, that should be used when interpolating this space to ensure the curvature of the space is taken in to account.
Sadly my GAN example ran in collapse mode at epoch #6 after 10 hours of running on my iMac:
GAN collapsed on iMac after 10 hours

The AWS p3 EC2 instance being not part of the free Amazon offer, I decided not to run the image face generation example on AWS. May be later.

Chapter 10: How to Identify and Diagnose GAN Failure modes

GANs are difficult to train. The reason they are difficult to train is that both the generator model and the discriminator model are trained simultaneously in a zero sum game. This means that improvements to one model come at the expense of the other model.
The discriminator model is trained separately, and as such, the model weights are marked as not trainable in this larger GAN model to ensure that only the weights of the generator model are updated.
Stable GAN:

Discriminator loss on real and fake images is expected to sit around 0.5
Generator loss on fake images is expected to sit between 0.5 and perhaps 2.0
Discriminator accuracy on real and fake images is expected to sit around 80%
Variance of generator and discriminator loss is expected to remain modest
The generator is expected to produce its highest quality image during a period of stability
Training stability may degenerate into periods of high-variance loss and corresponding lower quality generated images.

There are two failures cases that are common to see when training GAN models on new problems; they are mode collapse and convergence failure.
I cannot reach the results indicated in the book about the stable GAN and the collapsed GAN. Very rapidly, after 19 iterations, the discriminator reaches 100% on real and 100% on fake:

>19, d1=0.059, d2=0.093 g=0.315, a1=100, a2=100

Analysis after 250 iterations: >260, d1=0.001, d2=0.001 g=0.001, a1=100, a2=100

The loss of the discriminator decrease to a value close to zero.
The loss for the generator decrease also to a value close to zero.
The quality of the generated images are of very low quality:
GAN Generated images are all the same in collapsed mode

These are the properties of a GAN convergence failure.

Line plots and accuracy for a Generative Adversarial Network with a convergence failure

Conclusion

I am stopping here reading the first two parts of the book, FOUNDATIONS and GAN BASICS. GAN are complex and needs computing ressources such as for example Amazon Web Services EC2 with a Community AMI which is not part of the free AWS service offer.
I cannot run the GAN examples provided in the book on my iMac. AWS EC2 is mandatory to run GANs examples provided in the book, and the AWS free offer is not suitable.
The remaining parts of the book are:

GAN Evaluation
GAN Loss
Conditional GANs
Advanced GANs

I will go through these four remaining parts once I have decided to hook up on a real GAN project.
The GANs are very promising.
Thanks to Jason Brownlee for providing such practical knowledge.

europe

Libellés

dimanche, septembre 13, 2020

Generative Adversarial Networks with Python (Part I and Part II) - Jason Brownlee