11-785 Introduction to Deep Learning, 2022 Fall: Homework 5 Bootcamp

So hi guys, welcome to the like a short intro to GANS and then we’ll go on to home up by bootcamp. So to first go on to what GANS are, I’ll just share my screen and we’ll be using the slides from the previous semesters. Since you guys are actually from home up by you would have actually gone through the slides from the previous semester. So just let me share this. Can you see my screen? Yes, yes. Yeah, okay. So just to just a refresher about what GANS are. So GANS are a part of generative networks. So you must have actually seen VIEs, VIEs are good for generating new data. They use the reparmerization tricks or the random vector. Random vector is actually from a distribution, but what something VIEs like is basically a tendency to evaluate how do you actually evaluate something which is generated by VIE. You can’t just eyeball every image or every data which is generated, right? So for that case, you can actually use GANS. GANS have two networks, one is the generator and another is a discriminator. Generator can be a normal VIE or any kind of network which generates data. So if you take a random vector z from a probability distribution and feed it to the generator, it generates a generated data X hat and the role of this discriminator is to just identify if this data is fake or real. You have a set of real data from which you actually want to generate new data. So that’s the real data distribution. So like for example, in this slide, if we have a bunch of images, your task is to like model a new GAN so that it will generate another new image which is very similar to the data which you actually have at hand. So what basically GANs do is they actually model the joint distribution P of YX rather than the conditional distribution. Conditional distribution is when you have a data X and you find the decision boundary so that the data is separated like different data are separated. It works for a classification problem and for generator, generator models, you find the joint distribution. So that’s the basic info about GANs and you must have actually gone through the lectures for the quizzes. So going on to some other aspects of GAN, generator is basically any kind of network that generates new data and the goal of the generator is basically to generate new data which is similar to the data, some data points from the actual distribution. So the distribution of the generated data should actually match the distribution of the true data that’s the original goal of the generator. So that’s what is required and the discriminator. Discriminators aim is basically to identify if the input to it is given, input given to us is fake or real. So the real data is the data which you actually have a straining and the fake data is the data generated by the generator. So that’s how the discriminator should work and these are some math which is not required for now. Yeah, first step for any GANs you must have known is straining the discriminator. You have real images from your data set and you have fake images from the generator. So the first thing is that it has to, the labels for the real images will be true and the labels for the fake images will be false. So we’ll go to this slide. This will be a little easier to understand. Yeah, so this will be a little clearer. So the objective of your discriminator is that the discriminator should output one, the actual output of the discriminator should be one when the data given to the discriminator is the real data. So X is the real data. G of Z, Z is basically a random vector. So generator of Z is the fake data. So if a real data is given to the generator, it has to generate, it has to output one. That means that the data which is given is real. And if you give the generator data G of Z as the input to the discriminator, it should output the value zero. That means the data which is given as input to spec are the same time the object of the generator is that whatever input, whatever image or data it generates should be should become close to real. That means it has to fool this discriminator such that the data given by the generator should out data given by the generator generated by the generator if given as the input to a discriminator should produce one. So the aim of the generator is to produce data in such a way that the discriminator is fooled. So this is basically the optimization problem which will be working on in like pretty much every cans. So before that, we’ll actually go on to an example, an implementation example of cans. I’ll share the notebook with you guys. So a little different from what was done in the previous semester, we’ll be focusing on like generating art with the help of cans. So I took this reference bite-outs tutorial reference to model a busy again. So you can take a look at this link and we’ll be using data from this casual competition, which has a lot of money images. So the art images. So that’s what you’ll be doing. These are the basic imports and caggle data download comments, which is very familiar with you guys for you guys. So these are some help of functions to unnormalize images. So we’ll be normalizing the images in this range with mean of like 0.5 and standard deviation 0.5. These functions are just to reverse the normalization process. This function is to just show the images. You have global configuration. These are like pretty basics of you have a data set here. So before going into that. So the data which you get downloaded from the competition is something like this. In these four folders, we’ll just be using Bonnet JPG. It has around 300 images like of Monet Arts. So we’ll be taking this as the input data and the aim of our model is to generate a new image, which is very similar to how it would have been drawn by Monet. So that’s see that’s our goal. You have a set of training data, which is Monet images and given a random vector to your GAN model, you want to generate a new a new image, which is similar to how Monet would have drawn. So that’s the goal. So you have the data set class to include the day, except the data set. You can use a few transform like this. I’ve used 0.5.5.5 for mean and standard deviation, similar to the tutorial, which was linked above. You create the data set and data loaders here. And this prints a bunch of training images. So as you can see, these are good art images in the style of Monet. So Rm is to just generate a new image, which will be like this. So going on to the GAN models, these are just weight-in-it functions. So this GAN, which was actually followed in the PyTorch tutorial was a BC GAN. So which takes in a latent vector of like 100 dimension. As you know, the input to the generator is a random variable, as a random vector. And based on this random vector, it will generate a new data. So we give this input to be like 100. This is the constant which they have used. And these are some parameters. NGF is the number of generator features. You can play around these parameters if you want. So this is basically a generator. For the same time, you have a discriminator as well. Here, which takes in the image of shape 3x64x64 and outputs a single value, which says if the image is fake or not, a fake or real. So fake will get the output as 0 and real images will get the output as 1. So that’s the goal of the discriminator. This is how you have a discriminator. So this is similar to the BC GAN implementation given in the tutorial. And I have actually created this GAN class to make sure that this tutorial, like whatever notebook which I’m explaining to you guys is very similar to your homoqpy notebook. So this is just a wrapper class. This is just to make sure that all the other functions are very similar to how you are going to be implementing in homoqpy. So just to make things easier for you guys, these things are done. So and after this, I’ve also made the optimizer setting similar to homoqpy. So in the tutorial, I think they have used two separate optimizers, not in the form of dictionary, but in the homoqpy notebook, you guys will actually be having it in this form. So we have defined it like this here. So this is a fixed noise spectrum. The reason why we use a fixed noise spectrum is that as you all know, you have a noise spectrum, you input the noise spectrum to the generator and you get an image. So if you just have the same noise spectrum throughout all the epochs, then you can actually see how the generator is generating the same image. So at every epoch, if you want to generate, if you want to test out how the how the generator is generating images, you need a constant set of images. So that’s the reason why you just have a fixed noise spectrum here. You don’t want to change it just for the sake of evaluation. And you set real labels to be one and fake label to be zero that’s like pretty much standard. And this is the training step of the discriminator, which we’ll be doing first. As you as I explained earlier, the aim of the discriminator is to just maximize this lock probability of the discriminator output of the real data and the general and one minus the law discriminator output of the generated data. So that’s what we aim to maximize. And going over this, you get the real data. Labels for the real data is basically once because TL label is one and you get the output of the discriminator and you get the error. So this error is based on the original label, which is once and the real data. As you can see, this is the real data, which we get from the function. So that’s the log the this term. And now you want to train with the fake data. So you create a noise vector, you get a fake, fake distribution here, fake data. If you pass this noise vector through the generator, you get fake data. And the labels for this fake data would be zero. We are training the discriminator. So discriminator should be able to properly identify between the fake data generated by the generator and the real data, which is given to it. So for this reason, labels will be zeroes here. And with zero labels and the output output of the discriminator from the output of the discriminator given the fake data input, you pass you get the error and you just back propagate both the errors like this. So this and then after that, you just set the optimizer for the discriminator. So this is basically how discriminator works. You first train on the you get the loss for the real images and after that, you get the loss for the fake images, back propagated and set the optimizer. Now the next step is to train the generator. So both of them trains at different steps. You don’t step the optimizer at the same time for both. First, you train the generator and then you train the discriminator and then you train the generator. So for this generator step, you have you basically call model zero grad. Yeah, I wanted to explain this. You can either call optimizer zero grad or model zero grad. Some people prefer model zero grad because it kind of it is it makes sure that everything inside the model the gradients are zero. But anything is actually fine like both are totally fine. So this is a different implementation. That’s all. Now we are going to generate we are going to train the generator. So as mentioned in the slides, the object of the generator is that whatever input whatever the generator generates, that output should be classified as one by the discriminator. So you have the labels to be one here. You you fill it with real labels, which is one basically. You get the fake fake vector which we have it right. So the fake images which we use for the discriminator over here. Here, here. And you pass it through the discriminator together the output because of the output from the generator generator. So for this, the labels is one because in this step, the output of the generator are real outputs. That’s that’s the goal here, right. The generator aims to fool the discriminator and you calculate the laws based on this you back propagate the laws and you step the optimizer. So these are basically the two steps in GAN training. This is a training code which I’ve provided here and this. This is how you can actually run this code like from like on your own. It’s you don’t have to modify anything. There’s just a simple code for like on make you guys understand GAN a little bit better. So in the first epoch, you have like random images, which is basically a form from an untied model. As epoch passes by, you get the better images like this one. So and the reason why we used a fixed noise vector here here is that if you analyze this code, the straining code, this the images which we get from the generator are because of this fixed noise vector. So that’s the reason why we are actually able to observe how the same image changes over the epochs. So as you can see after a lot of epochs, you get better quality images. So this is basically how you can actually use a GAN for our generation. So feel free to run this notebook and yeah, the next thing which you can actually try with GANs is that you can use homework to data and generate new phases if you want. So that’s something which you guys can try out. So do you have any doubts with GANs for now? Like then just the basic basics of GANs. After this, we’ll go on to homework 5. Okay, I’m guessing that we do not, you guys don’t have any issues with GANs for now. The implementation like a basic art generation implementation. You can run this notebook to like just play around with it. So moving on to homework 5. So homework 5 as you know is unsubispeed definition. So first let me just explain what type of GANs you’ll be using. You have this unlabeled speech audio which is very similar to like how you used in homework, the other homeworks. So you have this unlabeled speech audio and you have a corpus of text. So you get this unlabeled speech audio. You can’t just directly pass, yeah, before that, the previous example which I showed you, you have a random vector, you pass in a random vector to the generator and you generate some kind of images, right? That need not always be the case. So in this, you know, USR, what we do is we pass in speech data and after that we generate a possible phoneme sequence. So that this GAN can not only be used for a general, like actual generation but can also be used for a record mission purpose. So you get the speech data, you pass through wave to, so wave to, gets time-surprise representations of the speech data and after getting the representations for each time step, you can actually you do a k-means clustering as shown here. If you do a k-means clustering, features from similar, like similar features will be clustered together. So getting those as different segments, you can pass these different segments into a generator. So basically you’re taking this like unlabeled speech audio, you’re grouping it into segments and then passing it through the generator to get a probable phoneme sequence, that’s same. So get a good representation of the speech or speech signal and then try to find a phoneme sequence. So that’s what you have been doing for homework three also, right? You get a speech signal, you get a, the output is a phoneme sequence. So how do you actually train the discriminator over here? You have unlabeled text. This is a Godpress of text like phonemes. This is, yeah, phonemes is used to like phoneme, like make it into phonemes. This has no relationship with the data which is given here. So like there is no link between them. What the discriminator needs to know, it needs to like learn a distribution in such a way that this, it needs to learn the distribution of this unlabeled data. So that like the objective of can is that you get a phoneme sequence by the generator. This phoneme sequence should fold the discriminator in such a way that this phoneme sequence should be identical to whatever has come from this given distribution. So that’s basically it in usr and moving on to the handout which we have provided. Can you see the handout? Yes. Yeah. But before that, do you guys have any other any issues with whatever we have actually see until now? Okay, I’ll take that as a know. Then moving on to the handout. So this is the handout which this is the training notebook which we have given. So you’re free to install other libraries for login and stuff. Kaggle commands need for like downloading the data. The first to do is just to like go to your directory. And here what you need to do for this cell, you just need to go to data sets. Not data sets. Sorry, it’s task. This unpack audio. Okay, it’s already given here. So you just need to replace these paths where your data is. So that’s what you actually need to do for this for this note for this Python file. Then moving on to training again. Yeah, okay. So so the next one is your you have set up the data now. You have set up the data. The next one is to get the data loaders and everything. This is very similar to how you do it for all the other homeworks. You’re free to like change all the parameters of the data loader and everything. You don’t have any to-do’s here. So you don’t have to worry about it. Next is the important part. You have the model. Let’s get into a more detailed explanation of the model. So if you actually go through this wave to back, you buy five. The first thing is the config class. So these actually have all the specific config of the wave to back model. You need to do a lot of ablations and document your results for the final report. So I suggest you guys try changing these configs. You can actually add or not like it’s very easy to add. They have you something called spectral normalization also which is default fault. You can try it out with true. You can try changing the dropout values and so on. You can use this for a set of the ablations which you guys need to do for the final report. We score based on how much how may you have actually done your ablations as well. So moving on to the discriminator. The discriminator is basically a three convolution layer network in this wave to back. So which takes the speech audio. Sorry, it’s yeah which takes the phoneme sequence and tells it tells if it’s fake or not. So as shown in the previous diagram. So it’s a three convolution layer network. And you can try changing the number of layers for your ablations. You can try all sorts of things and we encourage you to do that. The generate you don’t have any to do per say you can do a lot of things if like to try out your ablations. The generator is basically a one one layer convolution at a con one D layer. So that’s pretty simple in this. So you can also try adding more if you want. So this is the final wave to back model. The discriminator and generators are called here. So the first to do is for you guys is to in this file is to do gradient penalty. So go through the paper use this repository as a reference and try to understand gradient penalty and try completing it. That’s the first to do which you will be having. And the next. Then there is also another to do at the bottom. Yeah smoothness speed there’s also a small to do here which you can actually go through the paper again and have this repository as a reference and complete it. Then yeah this is basically it for this section. And here you need to define the output. Yeah sure. In the smooth smoothness. I say in the original code there’s another part like calculating the m and oh yeah yeah I think I’d apply to you on slack itself but that’s not required don’t worry. Okay so that’s not a good idea. Okay what? We don’t have to copy that here. Just no no you don’t have to. Yeah you don’t have to that’s not required. Okay yeah that’s not required. So any other questions about wave to wave to that pi? Okay cool. So next you define the optimizers. So you can actually define the optimizer based on these conflicts. So just a minute. So yeah there’s actually a conflict file over here just a minute. I’m trying to share it. So this is the conflict. So you just need to find out what conflict optimizers they have actually used here for the generator. You just need to use the same configuration that that should be more than enough for you guys. So you can try using schedulers if you want. So and try to use mixed precision training that will actually speed up your training. So next comes the most important part of this notebook is the run model. So we’ll go or detail and detail. So first thing what you can try to do is try to like have a sanity check loop where you print whatever is actually there in the in this net input. So you’ll get an understanding of what is actually there what it needs to be pushed to GPU and so on. So try to do that first and after that in model training first you need to step the like train the discriminator and then you need to train the generator. So we have put here that you need to use discrim step right. So let’s see what discrim step is actually in this in this wave to back. Yeah if you can see this. So discrim step returns a Boolean value like true or false. So based on this this is basically the epoch number. So based on this or the update number if I’m not wrong. So based on this it’ll return a true or false value. So if this is true if discrim step is true then there is another part in the forward this is actually called like the model forward. So these step is called here. So based on the update number this will return true or false. So if it returns true then that means it is the step for the discriminator. So this code this part of the code will be run which is basically training the discriminator. The last for the discriminator and generator are actually given here. So if this d-separous false then that means the generator is run. So you can actually use discrim step function from the model and based on this you will have to step your optimizer. Not step you have to zero grad your optimizer. If the discrim step is for the if this discrim step returns false sorry true then you will have to step the optimizer of the discriminator else you need to step the optimizer sorry not step you have to zero grad the zero grad the discriminator optimizer if it returns false you need to zero grad the generators optimizer. So you need to do that. So that needs to be done. So that’s how you can actually use discrim step because I think a lot of people might have faced confusion with this. So that’s how you have to use it and yeah you get like law stats as a dictionary right. So this is basically how you are going to be returning the dictionary you have losses over here like this. Some losses might be none some might not be none. You have to accumulate the loss. So if the loss is not none you can just add it to the total loss. So that’s the aim of this to do here. So based you have to like iterate through the loss stats in dictionary if the loss is not none then add add it to the loss total loss. That’s basically it just just a few lines of code and this also this also helps you to get which model is actually trained now if it’s the generator or the discriminator. So based on this group you update the you step the optimizer the whether you need to step your discriminators optimizer or the generators are optimizer. So that’s basically it for the score and these codes are just for you to accumulate these statistics. You can do it even without the stats. You have to just spend like the corresponding like evaluation which loss you want to monitor from loss stats. So that’s basically it for the validation step the valday val this task val task.valid step returns like validation loss stats that’s also a dictionary. It has a key called edit distance. So you can use that because we’ll be using edit distance for or the live interesting distance for evaluation. So you can actually use that in this. So here you I would suggest you to save the model because you need to run for at least 2000 nipers to get a good score. But don’t worry with mixed precision training e-pods will be like 32 40 seconds at max here with the Tesla T4 GPU. But what I have observed is that your model will stay at it will start with around like 100 150 not 150 115 11 inch distance and it will kind of stay the same like it will be oscillating in 150 for around 500 e-pods 500 or even 600 e-pods. Then after that it will actually start to reduce. So don’t worry just give it time. I would suggest you to like complete everything complete all the other parts of the code like as soon as possible and just allow your model to repaint. But as I told earlier it will take a long time. Convergence will you won’t get a proper 11 inch distance even after like 500 e-pods you will have to wait for a while. So it will be oscillating until 500 600 e-pods. So just be careful about it. You can actually log the validation training stats also in 1 dB if you would want or you can actually use it in use a different platform also. So we have actually given a quote for the test loader here. So here you have a test step function which is basically very similar to the valid step function. Yeah where is it? Yeah this valid step function right. So you just need to take a look at this function and modify it in such a way that it works for tests. So remember that you don’t actually have labels for tests. So you’ll just have to like modify it accordingly. So take this modify it and just use it for the test step and all the other codes are kind of similar to the other the other homeworks and you just make a submission create a CSV and make a submission. So this is basically it for the handout over here and do you guys have any other any questions about the training file the Python notebook or the handout? Yeah I have a question. Yeah sure. So in the part of the run in the function of the run model yeah okay. After if model training we have the first to do that we have to use the zero grade but yeah we can determine whether we have to zero grade the optimized discriminator or the generator based on the disk step. But yeah can we use the group variable like you define later a group equals to model dot get yeah yeah you should be able to I think I’ll play to you on Slack also about this that you can. So usually I use this because it’s kind of very similar to the implementation I suppose but this can also be done this should also be done. I’m just a minute mean yeah I think you can also use that but if you get any issues with this you can just go back to this like anything works. Okay and do we have to import another like the love and distance? No you don’t have to you don’t have to you don’t have to so everything is actually take and care of it in the imports itself you don’t have to do anything. Like for example I ran it in colab for training and everything so that was enough for me if you’re using a different instance in AWS or something you might need to install a few packages but I’m not really sure about it but yeah you might have to but far seek should actually be having everything which is required. So that’s why we have different 12 par seek right that should be having everything which is required. Okay any other questions? And you said in the test apart we have to delete the label but where which variable is for labeling? You’ll have to like go through the test valid step and just understand it and do it. It’s very similar to it the test code is very similar to it so yeah I can help you in Slack if you want about like going over your code. Okay yeah if that helps like not here but here but okay I can just say that this test step is very similar to like the validation. Yeah okay yeah Any other questions? I don’t have. Thank you. How about the questions? Yeah I don’t have. Cool so that’s about it for the handout. So just be just be aware of the fact that this just accounts to like 25% of your grade so your Kaggle submission is just 25% the final report has around 50% of your grade but we might be we are actually deciding on the fact of whether asking you to present a small video also which is very similar to how the project presentations are done. So we’ll give more information about that but make sure to have a lot do a lot of ablations because you need to be explaining all the ablations in your final report and that carries up T marks for now. So you need to be doing a lot of experiments with your teammate and you have to like explain all the experiments in the final report. So that carries a lot of mark even though this is just 25 marks this will actually help you to get those 50 marks so that’s about it. So if I’ll just stop the recording for now. So we’ll see you next.

AI video(s) you might be interested in …