Preamble
- This post is an extract of the book "Better Deep Learning" from Jason Brownlee. This post is about Force small weights with weights constraints described in chapter 15 of the book.Col des Montets
Chapter 15: Force Small Weights with Weights constraints
- Unlike weight regularization, a weight constraint is a trigger that checks the size or magnitude of the weights and scales them so that they are all below a pre-defined threshold. The constraint forces weights to be small and can be used instead of weight decay and in conjunction with more aggressive network configurations, such as very large learning rates.
- Weight penalties encourage but do not require neural networks to have small weights.
- Weights constraints, such as L2 norm and maximum norm, can be used to force neural networks to have small weights during training.
- Weights constraints can improve generalization when used in conjunction with other regularization methods like dropout.
- An alternate solution to using a penalty for the size of a network weights is to use a weight constraint. A weight constraint is an update to the network that checks the size of the weights (e.g. their vector norm), and if the size exceeds a predefined limit, the weights are rescaled so that their size is below the limit or between a range.
- Although dropout alone gives significant improvements, using dropout along with weight constraint regularization provides a significant boost over just using dropout.
- The use of a weight constraint allows you to be more aggressive during the training of the network. Specifically, a larger learning rate can be used, allowing the network to, in turn, make larger updates to the weights each update.
- Using a constraint rather than a penalty prevents weights from growing very large no matter how large the proposed weight-update is. This makes it possible to start with a very large learning rate which decays during learning, thus allowing a far more thorough search of the weight-space than methods that start with small weights and use a small learning rate.
- The Keras API supports weight constraints. The constraints are specified per-layer, but applied and enforced per-node within the layer. Using a constraint generally involves setting the kernel_constraint argument on the layer for the input weights and the bias_constraint for the bias weights.
Case study
- For the example, a standard binary classification problem that defines two semi-circles of observations is used. One semi-circle for each class. Then we will use weight constraints to reduce overfitting.
Dataset showing the class value of each sample
- Then we run a classical MLP on the dataset of 100 samples:
Line plots of Accuracy on Train and Tests datasets while training showing an overfit
- Finally we apply the constraint on the weights by setting the kernel_constraint:
- model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm()))
Line plots of accuracy on train and tests Datasets While Training With Weight Constraints
Conclusion
- Thanks to Jason Brownlee, I was able to test the weight constraint mechanism such as the L2 norm and maximum norm and demonstrate the effect on overfitting.
Aucun commentaire:
Enregistrer un commentaire