Preamble
- This post is an extract of the book "Better Deep Learning" from Jason Brownlee.
- The subject is "Decouple Layers with Dropout" described in chapter 16 of the book.
- The series of post related to "Better Deep Learning" is a way for me to (i) memorize the knowledge and to (ii) use it later on.
Col des Montets
Chapter 16: Decouple Layers with Dropout
- A single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training. This is called dropout and offers a very computationally cheap and remarkably effective regularization method to reduce overfitting and generalization error in deep neural networks of all kinds.
- Large weights in a neural network are a sign of a more complex network that has overfit the training data.
- Probabilistically dropping out nodes in the network is a simple and effective regularization method.
- A large network with more training epochs and the use of a weight constraint are suggested when using dropout.
- Large neural nets trained on relatively small datasets can overfit the training data. This has the effect of the model learning the statistical noise in the training data, which results in poor performance when the model is evaluated on new data, e.g. a test dataset. Generalization error increases due to overfitting. One approach to reduce overfitting is to fit all possible different neural networks on the same dataset and to average the predictions for each model. This is not feasible in practice, and can be approximated using a small collection of different models, called an ensemble.
- Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel. During training, some number of node outputs are randomly ignored or dropped out.
- Dropout is implemented per-layer in a neural network.
- Dropout is not used on the output layer.
- A new hyperparameter is introduced that specifies the probability at which outputs of the layer are dropped out, or inversely, the probability at which outputs of the layer are retained.
- Dropout works well in practice, perhaps replacing the need for weight regularization (e.g. weight decay) and activation regularization (e.g. representation sparsity)
- Tips for using dropout regularization:
- Use with all network types
- Dropout rate: a good value for dropout in a hidden layer is between 0.5 and 0.8.
- Use a larger network: when using dropout regularization, it is possible to use larger networks with lex risk of overfitting.
- Grid search parameters
- Use a weight constraint
- Use with smaller datasets
Dropout Case Study
- Jason Brownlee suggested a standard binary classification problem that defines two tow-dimensional concentric circles of observations, one circle for each class:
Circles Dataset with Color Showing the Class Value of Each Sample
- Then we run a classical MLP on the binary on the circles dataset:
Cross-Entropy Loss and Accuracy on Train and Test Datasets showing and Overfit
- Then we add a dropout regularization:
Cross-entropy Loss and Accuracy on Train and Test datasets While training With Dropout Regularization
Conclusion
- This post is an example of the Dropout Regularization available in the Keras library.
- Dropout regularization is a way to prevent overfitting in neural network.
- Big thanks to Jason Brownlee for the book "Better Deep Learning" and the included code examples.
- It was interesting to learn how Geoffrey Hinton discovered the dropout capability of neural network, the "aha" moment:
- analogy with the big size of the brain
- analogy with the fact that tellers at the bank are often changing
- analogy with sexual reproduction
Aucun commentaire:
Enregistrer un commentaire