11-785 Spring 2023 Recitation 0L: Workflow of a Deep Learning HW
Hey everyone, my name is Vidan and I’m on the TAs for 1175 this semester. In this gestitation, we’re going to be going over the basic structure and workflow for deep learning project. So if you’re relatively new to the field and haven’t completed or seen many projects like this, it can be pretty intimidating to open up a star and notebook and just see all these different sections and components without really knowing how it all fits together. So that’s basically what we’re going over in this gestitation is we’re going to see the different sections, the different components and really what the purpose of each one is. So when we’re trying to understand the different components of a deep learning or ML project, it’s really important to keep the bigger picture in mind of exactly what we’re trying to accomplish with these models. We’re using label data, which is basically data where we have the ground truth along with it. Each our model predicts certain target variables on unlabeled data. So we have data where we don’t know the ground truth, we want to be able to find that out. Therefore, the three phases we need to go through to really ensure that our model accomplishes this. So first, we need to train the model. So it actually learns to pick up these patterns in the data and we also need a way to evaluate how our model is performing on something other than the train data set. And finally, we need to test the model on data, which it’s never seen before in any capacity, to really get an idea of how it performs and how it generalizes. So keeping this in mind, we need three different data sets. So the first one is the train data set. And this, so suppose to train the data, we have the ground truth and we have the, so we have the predictor variables and the target variables. And we use the predictor variables to predict the target. And this is really what we use to teach the machine learning model, the different patterns and correlations in the data. Second, we need a validation data set. Now this comes in because we need to be able to evaluate the model while we’re training it to see how it performs on something other than the train data set. And then finally, we have our test data set, which is data that the model’s never seen before. So a common question here or a misunderstanding is why we really need a validation data set. Since we already have that test data set on which we’re evaluating our data. And the reason for this is that we use performance on the validation data set in ways to tune our hyper parameters or like select the best performing model. So in some ways, the performance on the validation data set is affecting the model. So it wouldn’t be completely unseen to the model. So we couldn’t really use this as a test data set. And this really motivates the need for three different data sets. So this slide really just summarizes the form main phases we’re going over at a very high level. So first, we have data loading and processing. So in this stage, we take data in whatever form we have and whatever file format and we get it ready to feed into our neural network. So this would involve in PyTorch creating data set and data loader classes. And this is also the stage in which we’d apply some processing techniques or normalizations. Then we have the training phase and this is usually the most involved phase that has the most code and stuff. So we iterate over the trained data set and we back propagate our errors and this is where the model really learns how to pick up those correlations. Then we have evaluation where we write some code to see how the model evaluates in the validation data set. So as we’re training, we’re getting an idea of how this model performs on data which it hasn’t been trained on. And then finally, after we’re done with this whole process, we’re going to test it to see how it performs on data. It’s never seen before. This slide basically summarizes what we’ve been over, but it goes into a little more detail on how this would be implemented in code. And this is just going to make a lot more sense once we actually go through the Jupyter notebook and you can see how it translates there. But before we do that, just a quick aside on optimization because a large part of what we’re going to be doing for the homework part 2 is finding the straight off when underfitting and overfitting. So underfitting is when the model is not being able to really learn like these correlations and these patterns and the data and in this case, it’s not performing well on the training data set as well. And you can really see it on this picture down here where like it’s not being able to fit the data points at all. And in contrast to this, we have another problem which is called overfitting. And this is where the model almost fits the training data too well. And so it starts to fit the noise in the data. And the problem with this is that when the model starts to do this, it doesn’t really generalize the unseen examples as well because it’s fitting the training data too specifically. So this is characterized by good performance on the training data set, but poor performance on the test and validation data set. And you can see on the slides that some common causes for underfitting are that the model isn’t complex enough. It’s not wide enough, it’s not deep enough. Or maybe it’s over regularized. So it’s not being able to learn these patterns and the data. And in contrast, when we have overfitting, it’s because the model is too complex and it’s not regularized enough. So now that we’ve seen in concept how workflow looks like, let’s see how it looks in an actual project. So the purpose of this notebook is to classify MNIST digits using PyTorch. So if you’re not familiar with the MNIST data set, this is basically what it looks like. It’s a bunch of handwritten digits going from 0 to 9 and we’re going to use a basic neural network to just predict what number the image is off. So going back to slides, first we have a data loading and processing stage. And for this particular data set, we can just load it straight from PyTorch. And in some other homeworks and assignments, you’ll see that this is usually a more involved process where you have to write a custom data set class and a custom data loader class. But for the purpose of this hesitation, we’re just keeping it simple so you can see how it fits in with the overall process. So you can see here, we download our data set as a PyTorch data set class already. And then we make a data loader class using the PyTorch data set. And the only transformer doing here is the two tensor transform. So we’re transforming the image to a tensor so it can be fed into our neural network. This is just a quick sell so we can visualize the data and see what it looks like. And now we get into our training and evaluation. And I know in the slides, I put them in sequence like it happens one after another. But in reality, in our main training loop, training and evaluation happened together multiple times. We’d chain our model for an epoch. And then we’d calculate its accuracy on the validation set. So for every training loop iteration, we also get an idea of the accuracy on the validation data set. So the first thing we do is we instantiate our model. So this is a relatively simple model. We just initialize the here. It’s just a simple two linear layer model. And this lets us see a quick summary of that. So we have our input layer to the hidden layer and we have our hidden to our output. So it’s just a one hidden layer model. Next we need to initialize an optimized and a criteria. And so what we’re doing is we’re just kind of initializing things that we need for the training process right now. Next we’re going to write helper functions for training evaluation. Now for this recitation, you don’t really need to understand how they work. But essentially, we’re going to have a chain epoch helper function that chains the model for one epoch and we have an eval function that’s going to calculate the accuracy on the validation data set. And then after we’ve initialized everything, we have our main training loop. And as you can see in the training loop, we call the chain function, the chain helper and the eval. So this kind of happened again and again iteratively, right? So let’s take a second to pause and see what we’ve done so far. Because right now we basically covered the bulk of the process. So first, we initialize our data set and create a data loaders and apply necessary transforms. And that completes the data processing step that we have, which is step one in the overall process as you can see here. Next for the training step, we created a chain helper function and we initialize and instantiate it our model class optimizers and our loss function. And we wrote our complete chain loop as well for evaluation, we have an eval function ready to go and we’re using it to select the best model that we have based on validation accuracy. And then finally, the only step we’re left with is testing. And luckily for this case, we can just use our eval function again, but notice that we’re passing in the test loader instead of the validation loader because really the only thing we use the test data for is testing the model finally, and an important thing to keep in mind here is how we use the scores on the validation data. So we’re using the validation accuracy to select the best model. And this is really what I was talking about earlier about how since we use it for model selection, it’s not really unseen. So we can’t use it as a test set as well. And that brings us to the end of it. This is just a simple run through of all the different stages that you’ll find in common across all different ML or DL projects regardless of complexity. And yeah, hopefully this helps you understand the overall process a little better and how everything fits into place. Thank you.