vendredi, novembre 13, 2020

Better Deep Learning - Jason Brownlee - Stacked Generalization Ensemble

 Preamble

  • This blog post is an extract of the book "Better Deep Learning" from Jason Brownlee. 
  • This blog post is related to "Better Predictions": how to combine predictions with Stacked Generalization Ensembles (chapter 25)

Alpine A310

Chapter 25: Learn to combine Predictions with Stacked Generalization
  • Model averaging is an ensemble technique where multiple submodels contribute equally to a combined prediction. Model averaging can be improved by weighting the contributions of each submodel to the combined prediction by the expected performance of the submodel.This can be extended further by training an entirely new model to learn how to best combine the contributions from each submodel. This approach is called stacked generalization, or stacking for short, and can result in better predictive performance than any single contributing model.
  • Stacked generalization is an ensemble method where a new model learns how to best combine the predictions from multiple existing models.
  • Stacked generalization (or stacking) (Wolpert, 1992) is a different way of combining multiple models, that introduces the concept of a meta learner. Although an attractive idea, it is less widely used than bagging and boosting. Unlike bagging and boosting, stacking may be (and normally is) used to combine models of different types. The procedure is as follows:

    1. Split the training set into two disjoint sets.
    2. Train several base learners on the first part.
    3. Test the base learnerson the second part.
    4. Using the predictions from 3) as the inputs, and the correct responses as the outputs, train a higher level learner.
  • Note that steps 1) to 3) are the same as cross-validation, but instead of using a winner-takes-all approach, we combine the base learners, possibly nonlinearly.

Case Study

  • We are going to experiment the technique of "Stacked Generalization Ensembles" on a multi class classification problem. We will try to solve this classification problem with a classical MLP to establish a baseline, then in a second phase we will experiment the technique of Stacked Generalization Ensembles.

Plot of samples

  • Then we run our MLP on the samples. The author suggests to create a sample of 1100 data points, the model being trained only on the first 100 points and the remaining 1000 points are held back in a test dataset.

Learning curves of Model Accuracy on Train and Test Dataset

First experimentation: a separate stack model 

  • We start by training multiple submodels and saving them to files for later use in our stacking ensembles.
  • We then train a meta-learner that will best combine the predictions from the submodels and we will see if the performance is better.
    • The author prepare a training dataset for the meta-learner by providing examples from the test set to each of the submodels and collecting the predictions: dstack() and reshape() NumPy functions are used for combining arrays.
    • The meta-learner is trained with a simple logistic regression algorithm from the scikit-learn library. The LogisticRegression class supports multi class classification (more than two classes) using a multinomial scheme.  
  • Once fit the stacked model is used to make predictions on new data.

Fitting a logistic regression stacking model

  • From the image above, the result of the experimentation is that the stacked model with a performance of 83,4 % outperforms all single model accuracy.

Second experimentation: an integrated stack model

  • When using neural networks as submodels, it may be desirable to use a neural network as a meta-learner. Specifically, the sub-networks can be embedded in a larger multi-headed neural network that then learns how to best combine the predictions from each input submodel.
  • The graph of the stacked model:

Stacked Generalization Ensemble of Neural Network Models

  • The result of this stacked model is better that the separated one: 83,8% of accuracy compared to 83,4%.

Conclusion

  • Stacked generalization is an ensemble method where a new model learns how to best combine the predictions from multiple existing models.
  • We will experiment in the next chapter how to combine model parameters with average model weights ensemble.
  • Big thanks to Jason Brownlee for helping me to understand these notions of Ensemble Learning.

Aucun commentaire: