Early Stopping. The Most Popular Regularization Technique In Machine Learning.

It’s hard, that delicate balance where you can’t train your model for too long, nor too little. The constant possibility of overfitting of building something that doesn’t go anywhere. At least, it’s always being hard for me. And finding that perfect balance is crucial. It’s the only way we can start building machine learning models that work. But how? First, let’s get on the same page. Here is a dataset and a model that I built. This model is not good. It hasn’t learned the patterns in my data. What is this shape here and how the model misses it completely? We say that this model is underfitting. Here is another model. It’s not a good model either. Notice how this time, the model has to learn all of the noise in the dataset. We say that this model is overfitting. Finally, this is a good model. Doesn’t underfit, doesn’t overfit. This is what we want. Every time we’re building a model, we are playing the same game. We don’t want the model to underfit. We don’t want the model to overfit. We want to do just enough, not too much, not too little, so the model generalizes well to data outside of our training dataset. And that’s a key inside. I want you to remember for the rest of this video. And here is the thing. A critical factor to ensure we train a good model is how long we train it. I guess it’s not necessarily time, but something similar. We call it the number of iterations or epochs. Train a model for too many epochs and you’ll underfit. Train it for a few epochs only and you’ll end up underfitting. But how do we find the perfect number of epochs to train a model? Let’s play a dumb game here. This here is a police car. It’s gonna start driving and I wanted to stop it with the barriers. It cannot stop too early and it cannot stop too late. It’s gotta stop exactly between the barriers. Now police cars have brakes so this should be pretty simple. The only thing the driver needs to do is to push the brakes exactly at the right time. And that’s exactly what we need to do with our model. The car knows when to brake because it sees the red barriers. But how do we know? How do we know when is the perfect time to stop our training process? We need a way to evaluate our model as we train it. Train a little, evaluate the model, a little bit more training, evaluate it again. This way we can see how the model is doing in real time. To make this happen, we need two things. Number one, a separate holdout set that we call a validation set. That’s where we will evaluate our model. And number two, a metric that measures the performance of the model. That’s it. I’m sure you’ve done this one million times before. With these two, we can start training our model. We’ll evaluate it after each epoch and start plotting our metric to see how the model is doing. Now look at the chart on the screen. Assume that we picked the validation loss as the metric to measure how the model is doing. The blue line is the training loss, while the orange line is that validation loss. And I want you to notice what happens around this area here. Certainly the validation loss starts increasing, indicating that our model is getting worse. If you train for too long, that’s exactly what you’ll see. And that’s it. That’s the point where we want to push the brakes and hold the training process. There is a problem with this approach though. There is a lot of manual work in it. We have to train longer than necessary to find the inflection point. We have to draw a chart and look at it to decide when to stop the process. And then when we do, we must train the model from the start. And most importantly, if we change anything. Not all with our model. We must restore the whole process from the beginning to find a good number of epochs again. Manual work? Not good. So let’s make it better. First, no shorts. We don’t want to be looking at anything to make any decisions. We want that to happen automatically. But how do we do that? Well, think about the validation loss. For the best part of the training process, the validation loss decreases until it hits that inflection point when it increases. We can detect that automatically if we watch the loss value. And notice that the direction changes for a few consecutive epochs. We can assume that’s the point of no return and hold the training process. If there is a lot of noise in the training process, we might see the validation loss oscillating up and down. We need to give it time to show a consistent pattern for a few consecutive epochs. We call that patience. Secondly, since we need to wait for a few consecutive epochs before stopping the process, by the time we make that decision, the model has been overfitting for a while. So we need to find a way to get back in time and grab the model before it started overfitting. This is how we solve the problem. As we train the model, we want to store a copy of the weights as long as the validation loss of that copy is better than the previous one that we saved. That way, as soon as we stop the training process, we will have the best set of weights ready and waiting for us. And that’s every ingredient we need. We call this process early stopping. The four components of early stopping. Number one, a validation set. Number two, a metric to measure the model’s performance. Number three, the ability to save the best copy of our model as we make progress in the training process. And finally, number four, a trigger that holds the training process as soon as we notice that the metric we pick has been going in the wrong direction for a few consecutive epochs. That’s it! Every major library out there supports early stopping. You configure things and they will take care of everything for you. Early stopping is probably the easiest and one of the most effective ways to regularize your model and prevent it from overfitting. And while we’re talking about training models, you might want to take a look at this video here that will show you a pretty cool technique to train a model without having to label all of your data first. And of course, as always, I’ll see you next week.

AI video(s) you might be interested in …