mercredi, novembre 04, 2020

"Better Deep Learning" - Jason Brownlee - Better Predictions -

 Preamble

  • This post is an extract of the book "Better Deep Learning" from Jason Brownlee. 
  • This post is related to "Better Predictions": how to make better predictions using model ensembles.
  • In this post, you get the result of the code examples that come together with the book. The code examples are a great way to put in practice the theoretical part of neural network. This is a big advantage of the series of Deep Learning book written by Jason Brownlee.

Mer de glace 


Chapter 19: Reduce Model Variance with Ensemble Learning

  • Deep learning neural networks are non linear methods. This means that they can learn complex nonlinear relationships in the data.
  • A successful approach to reducing the variance of neural network models is to train multiple models instead of a single model and to combine the predictions from these models. This is called ensemble learning and not only reduces the variance of predictions but also can result in predictions that are better than any single model.
  • A solution to the high variance of neural networks is to train multiple models and combine their predictions.
  • Combining the predictions from multiple neural networks adds a bias that in turn counters the variance of a single trained neural network model.
  • Perhaps the oldest and still most commonly used ensembling approach for neural networks is called a committee of networks. A collection of networks with the same configuration and different initial random weights is trained on the same dataset. Each model is then used to make a prediction and the actual prediction is calculated as the average of the predictions.
  • Ensembles may be as small as three, five, or ten trained models.
  • Varying Training Data: a natural way to reduce the variance and hence increase the prediction accuracy of a statistical learning method is to take many training sets from the population, build a separate prediction model using each training set, and average the resulting predictions.
  • There are more sophisticated methods for stacking models, such as boosting where ensemble models are added one at a time in order to correct the mistakes of prior models. Another combination that is a little bit different is to combine the weights of multiple neural networks with the same structure. The weights of multiple networks can be averaged, to hopefully result in a new single model that has better overall performance than any original model. This approach is called model weight averaging.

Chapter 20: Combine Models From Multiple Runs with Model Averaging Ensemble

  • Model averaging is an ensemble learning technique that reduces the variance in a final neural network model, sacrificing spread (and possibly better scores) in the performance of the model for a confidence in what performance to expect from the model.
  • Deep learning neural network models are nonlinear methods that learn via a stochastic training algorithm. This means that they are highly flexible, capable of learning complex relationships between variables and approximating any mapping function, given enough ressources. A downside of this flexibility is that the models suffer high variance. This means that the models are highly dependent on the specific training data to train the model and on the initial conditions (random initial weights) and serendipity during the training process. The result is a final model that makes different predictions each time the same model configuration is trained on the same dataset.
  • The high variance of the approach can be addressed by training multiple models for the problem and combining their predictions.

Case study

  • First we are defining a problem which is a multi class classification problem. We will then look in a second step for a model to address this problem. As it is a multi class classification problem, we will use a softmax activation function on the output layer.

Multi class (3 classes) wit Points colored by class value
  • Secondly we try to resolve the suggested multi class classification problem with a MLP neural network. Only a stochastic training algorithm can solve the relationships between variables.

Cross-Entropy Loss and Accuracy for training and test dataset over 200 epochs

  • Thirdly we are going to experiment the variance of the previous model. For this the code proposed by Jason Brownlee repeats the fit and evaluation of the same model configuration on the same dataset and summarizes the model:

Raw Data, Box and Whisker Plot, Histogram for accuracy over 30 repeats

  • In the step #4, we can use model averaging to both reduce the variance of the model and possibly reduce the generalization error of the model. The piece of code will try to find the correct number of Ensembles:

Line Plot of Ensemble Size Versus Model Test Accuracy

  • For the final step #5, we are going to repeat evaluation experiment to use an ensemble of five models instead of a single model and compare the distributions of scores. So five models are fit and evaluated and this process is repeated 30 times:

Repeated Evaluation of a model average ensemble

  • This last experimentation confirms the theory of ensemble averaging which relies on two properties of artificial neural networks:
    • in any network, the bias can be reduced at the cost of increased variance
    • in a group of networks, the variance can be reduced at no cost of bias

Conclusion

  • Thanks to the code provided in the book from Jason Brownlee, we were able to test model averaging.
  • Ensemble Learning is a technique that can be used to reduce the variance of deep learning neural network.


Aucun commentaire: