Top Kaggle Solution for Fall 2022 Semester

Welcome to Applications with Deep Narrow Networks with Washington University. In this video I’m going to show you the Kaggle competition that we just competed. This was a Kaggle competition that I put together just all original data for time series forecasting. And we’re going to look at the presentation given by the winning team. The winning team are two of my students from the class. They’re both PhD students at Washington University. So let me go ahead and introduce them both and then we’ll take a look at the data. The first is Robert Jisserie who is a doctoral student and computational data science at Washington University. The second member of the team is Tom Ernest who is in the psychological and brain sciences group at Washington University where he is a PhD student. I’ll put links to both of their information in the description. So let’s go ahead and look at the data set and then we’ll have a look at their solution. I will put a link to this Kaggle community competition that we just completed in the description. I will also put a link to the data set because I uploaded the entire data set now to Kaggle where you can you can make use of it and try it yourself. This data set is was created with a simulator a multi agent model and it looks at demand forecasting. So you’re trying to forecast the demand for restaurants in a location. I set it up like Fort Lauderdale so it’s it’s a beach area and the seasonality corresponds to two Fort Lauderdale USA in Florida and all the images that I have here these were generated through through a generative neural network. Stable diffusion to be exact. So here’s some information they are just describing it. The data range from January 1 through December 31, 2019 to 2021. The dates do reflect seasonality. There are no major blacks one events like the pandemic starts up right in the middle of this. That’s not even a factor. So there’s no wars that kind of thing. There are some events that are unpredictable that that happen in here where various products are created and discontinued and that sort of thing. If you look at the data I give you a number of files here. There’s a sample submission and we’re predicting the sales. So I give you the say the test data for the sales you don’t have the actual item counts there that you’re trying to predict and then I give you training data the items and restaurants. Which unique about this data set is there’s really three ways that you can deal with it. You can deal with just the pure pricing data. You can also deal with natural language processing which are the titles of the items using the NLP on the titles of the items. You can probably feed in some embeddings or something to link similar types of items together and then there’s also images and we’ll take a look at all of those real quick. So there’s a number of restaurants that are here. This is in the restaurants CSV file. There’s not too much data really here. So here you see the restaurant. There’s not a lot of data here. This is just really to link them. There are some of these do’s give some indications that might be useful like sweet shack. They have mainly dessert type items, serfs up and beachfront bar are obviously located on the beach. Which is going to be important in a moment when you see some more information. So these are the items. You see there’s an ID that just is the primary key for it. You’ll see that in other tables that make reference to the item, namely the sales. You also have the store ID which is the restaurant. So those six restaurants that I showed these that corresponds to the primary key there. This is the natural language processing element of it. The items, there’s descriptions of them. Which if you put embeddings to them that might be useful to help you. There’s the calories in each of these and the cost of each of these. The cost does not vary throughout the simulation. And then finally this is the actual sales data. You have the dates here. The item number, the price of that item and the item count that were sold that day. This item count, that is what you’re trying to predict in the final quarter during that date range in 2021. One important thing is some of the items don’t sell on certain days. And I do put the zeros in. They’re simply not missing missing rows. So that’s your tabular data. There’s also images from a webcam that is located on the street. These are all simulated images. So I had numbers generated from the simulation. And then I created a rendering program that that created images like these. So you can see a couple of them here. They’re not the highest of resolution. The key things that you’d want to count up is how many people are on the street. How many people are in the beach? Or probably also useful because some of the restaurants do apply more to beach visitors as well. But you can see some of the examples here. They certainly some have more people on the beach than others. And these are relatively low resolution images. I believe they’re 1024 by I forget the height of it offhand. But I’m sure they look pixelated blown out to 4k video. So now we’re going to switch to the actual zoom call where the the 2phd students presented their winning solution. Just to also show you the leaderboard. DCDS nerds was their team name. And you can see they did have a decently lower RMSC than the other teams. Yeah, I’ll talk for someone. I’ll Robert will jump in for a bit. But yeah, so we’ll take you through what we did for the competition. So I guess the reason we’re splitting the presentation up is kind of that’s how we worked on the modeling as well. Basically just each we’re working on it somewhat independently. And then yeah, trying to we basically have the idea that we wanted to do some sort of ensemble. So kind of the simplest way was just to have each of us make a model and then kind of create an average of those predictions at the end. So what I was doing, I’ll talk about first, is kind of based on a convolutional neural network approach. And I was trying to have like a single model for all the features at the same time. And then what Robert was doing was kind of looking at trying to do something that was more item specific, so making a model for predicting the sale of each item. And he’ll talk about more specifically what he did. So I guess too much to say because I think I produced convolutional model. I was mostly following kind of the example that was in that Jeff provided in the naive kind of code notebook. So it’s again, yeah, using this sort of one-dimensional convolution, the five-to-the-time series. The main thing that I was trying to expand upon that was just basically having the number of features increase. I think in Jeff’s model in that notebook, it was just having the sales data in order to forecast future sales. So I was trying to include other things. And then also just the amount of time that that was trained on or the kind of window for looking back and forecasting. So I ended up just having three features in the final model. So which was the sales history. So the item count and also the price. And then just a single indicator for it being store one versus any of the other stores. And the reason for that is if you looked at kind of the sales, this is just plotting like the best selling item by each store. And per store one, there was like way more sales than any of the other stores. So it seemed that was originally we had kind of one hot encoded all these, but it seemed just having that was kind of enough to point out some of the meaningful signal. So in various other submissions, I had tried kind of other combinations of different features, but these kind of three ended up through sort of a loose search process. And it up being the ones that were the most informative, I guess. So but I had looked at some of these other things, including some of the like kind of people derived from the images, which I think Robert ended up using, but I didn’t to this level. But so yeah, and I guess the other major change was I kind of increased this look back window for the training. So that I was looking at 90 days instead of I think 10 days it was in the example, but still a forged casting out about three months to capture all the testing data. So basically the kind of like final shape of the training data. This was this, yeah, all the days that were being predicted. And then by 90 kind of time steps for each of those days in the past and then three different features. And yeah, really the only the sales were the ones that were kind of varying across the time steps, but still it seemed including the price of the indicator for the store was was helpful for increasing the model performance. So just so yeah, show it’s possibly this is kind of yeah, what what the model looks like, which is very similar to what was in the original notebook that Jeff posted. But yeah, it’s using this 1D convolution followed by max cooling and then a planning into a fully connected layer, which was outputting, yeah, a single regression point basically to predict the sales count. Yeah, and then here’s some of the other parameters below. So I’ll just say that this model alone was not performing as well as what we found by kind of combining what I was doing and Robert was doing. So it seemed that doing this averaging definitely helps. And yeah, I guess probably thought some of the overfitting a little bit. So with that, I’ll let Robert talk about what he did. Yeah, so when I started looking at the data, I noticed that the items were very distinct. So some of them had like a spline nonlinear curve and then some of them were very linear. So I thought using this item specific approach would kind of add us a little bit more accuracy or precision. And then there’s also kind of like left me with the different problem in terms of what what type of features could I use. So I could only really use the ones that varied across time. Otherwise, it wasn’t really that informative to include ones that were constant. So I ended up using like all different variations of these eight features. But ultimately, I just decided to keep the yellow metrics, the number of people in the street and the number of people in the beach. If I had more time, I probably would have tried to do more testing and with a cross validation method. I think the only really kind of unique feature that I was kind of onto that I ultimately didn’t decide was number five, which was the mean center deviation per quarter. And I saw this figure in Jeff’s notebooks and it was kind of showing that looks like quarter four, there’s a lot more variance relative to the other quarters. And I was hoping that adding that feature would kind of give these models an indication to increase the variance in their predictions, but it didn’t really work. Next slide. So for the for my initial attempt, I just used the Facebook Crawford model that Jeff had kind of gave us in the notebooks. And it did pretty well. You could see these forecasts here with which each of the items. I think this was probably the best one that I ended up with. But then ultimately my public score was and my private score was was really not that great. So I tried to do something a little more extravagant, try to increase the accuracy. So next slide. Okay, so one of some of the things that I learned from that initial set of profit models was that I didn’t really have any optimization for the hyper parameters. So there was mainly two or three, mainly two, the seasonality prior and then the change prior change point prior were like two hyper parameters that will like greatly influence the amount of the the amount of the predictions and forecasts. And I was kind of just picking those at random. And I didn’t really have a way to optimize or find the best set of numbers for those two parameters. And then another one was I was looking at the predictions and some of the items that had low volume like there was very little sales. They did just a really poor job. So I went through like a whole GitHub like rabbit hole. And I found this developer and he ultimately decided not to really use a negative binomial likelihood for profit. And he said there was many other better models out there. But he didn’t really meant reference them. So kind of left me at a dead end. And then one more. And then aside from the low count, when I started to notice when I was looking at these predictions, is that some of the items that had larger ranges, those were like really the ones that had a lot of leverage in the RMSE. So I really wanted to focus on number like 1938. Those were like two of the items that if we got those wrong, like our RMSE would just be really off. But there was also like miniature black swans, which I tried to account for when the items would go from like zero to like a bunch of sales, which was kind of at this halfway point. And it was like June 2021. So yeah, I used this model time package to try to tackle those three challenges that I had encountered before. And model time was was cool because it allows you to compare multiple pretty algorithms on the same like platform and workflow and it made a very seamless. It was a little hard to get some of these other models running. But ultimately, I just decided to stick with it because I did look at Jeff’s YouTube channel and found that the best the best presenters were the ones that used an ensemble approach. So I thought this was worthwhile. And I evaluated so you could go the next one. I evaluated these models using a cross validation approach within the model time package. So I kind of used like either 10 or 20% of the training data as a validation set. And I had multiple models here and I could see like how they’re performing. The linear regression was underfitting along with the REMA models, which I could not figure out why it was probably just because of my own naive, like just being naive with those models. And then the multivari adaptive regression splines was overfitting to the seasonality. I don’t really have a good picture here, but yeah, I just ultimately decided to stick with these top five models. So the way I, because like when I was doing the cross validation, I noticed that sometimes the validation was performing really well. And then the forecast would just like overfit to the seasonality when I would try to predict for the test set. So I just really decided to look at these more manually across the predictions for the test set. And they were, so some of these models were not optimal either. Like you could see the ETS for this item has this like large downtrend, which wasn’t really accurate. But when I averaged them together, it did kind of look a little bit more reasonable. And then this is our final submission on that one, the competition. So Tom and I had averaged our best models together based on the public score, actually not based on my best public score. I had a lot of faith in this ensemble approach, even though I had better, a better score in the public or another set of models. Okay, that, that looks really good. And obviously got very, very good results. I have a few questions myself, but I’ll let the class go first. Do you, do you have any questions for the two presenters? Okay, I’ll go ahead and, and, and go on. So it looked like you, so you had really a separate model. I did each of you have a separate model for each of the individual items. I did not, I just had one model. Okay. It was modeling all items of the same. Okay, right. Right. The item one. Yeah. Yeah, exactly. Yeah, then I had five models per item. Okay. Yeah, in the generator for the data set, there were definitely some items that, I mean, it’s kind of neat the way the, the software that I use to, to do the multi agent system worked, basically some of the, the, the, the supply channels changed going through and that caused the individual merchants to decide certain items were not profitable anymore. And they, they pulled back in adjustment. In adjustment to those, I think, and in some cases, they would raise. I mean, that means the, and that’s what’s kind of cool about the multi agent system. I don’t necessarily, I set the parameters, but then what actually, actually happened, even having access to all of the, the variables, a lot of it is just in the, the stochastic nature of the, of the multi agent system. And then I also decide which variables to, to make available to you. So for example, the number of people on the streets, I simulated that. The images that weren’t coming out of the, the multi agent system, but, um, the, the volumes were. So that’s, that’s kind of how, how that worked. Now, when you put it together, did you, did you just do a simple 50, 50 average of the two, or did you do any waiting between the two models? We did, yeah, we did a 50, 50 average. If we had more time, I think definitely we would have tried to use a more complex and some willing approach. Yeah. I know what has gotten teams in trouble sometimes on the, in the past as they would use the leaderboard to guide them for the waiting. So they would just real quickly. I mean, if you’re trying 50, 50, then you could really quickly just try plus 10 minus 10 on whatever your, your top model was and then just pick the highest one of that. And one team did try that. I think about a year ago. And that cost them moving from position one to position three. So that’s, it’s, it’s not always good to do a blind, just kind of a blind guess, guess approach on, on that. Yeah, I think we were like, we’re wanted to, yeah, I guess like hold out some of the data and the measure what, what, what, um, do like a grid search on how to wait. I guess you could kind of courtfully look at that, but that just, yeah, I guess took a lot more time to do. We were kind of crunched so we didn’t end up, yeah, looking at how to wait at them. So we just did half and half. Yeah, that makes sense. I guess my models, sorry, my models were not the best ones on the public score. I was surprised that they were performing so poorly on the public score, but I still went with them anyways. Yeah, yeah, sometimes you have to, in the real world too, sometimes you do have to go with, um, with, with that as well in terms of just, well, in the real world, you have a public leader board until things go live and then you have to, you have to really have a lot of faith in your cross validation approach. Whatever that ends up, ends up being. Okay, um, I guess Robert, I saw you used a number of different, different models. What did you think of Facebook profit overall? That’s kind of a new entry that, that is getting a lot of, a lot of notice in the Kag, in the Kaggle world. I haven’t worked with it a lot myself, I threw together the quick, quick demo and that was about it. Yeah, like I really hope that I was going to find a better model, but, um, I think the DPR model was probably up there with, okay, it was really disappointing to go through all that work with model time and then still end up with, yeah, upperforming. I will say though, I don’t think the predictions that caught on to the seasonality and the long term trends really well, but it just didn’t have much variation in it, and it’s like weekly predictions. I think that’s where Tom’s model really outshined. Okay. Well, clearly combining the two worked, worked really well and I guess you work, did you work largely independently? I mean, probably comparing every now and then to, to sync up and see how other and sampling. Yeah, we did, I think, and like kind of. I don’t know, I think we were both at the beginning, like very confused about how to model the time series stuff. We were like kind of figuring it out on our own way and helping each other figure it out. And then, yeah, I think we just kind of. Try working on, I think what made sense to us and. Yeah, I guess tried to like, yeah, there’s a bit we’re trying to like make a bunch of submissions like during the last week. So that was kind of helpful to like, I don’t know, have a more diverse set of things we were looking at, but. Okay. I’m glad we ended up combining them. Okay, well, very, very good. Thank you for watching this video. And if you want to follow along with this course, please subscribe to my YouTube channel and give the video a like. If you think it was useful, feel free to post any questions you might have in the comments.

AI video(s) you might be interested in …