Exploratory Data Analysis (EDA) with Plotly for Timeseries Demand Forecasting

Let’s have a quick look at exploratory data analysis for demand forecasting using Plotly. Let’s real quick take a look at the time series demand forecasting data set that we’re going to use. This data set is hosted on Kaggle and I created it. It’s essentially a simulation that looks at forecasting the demand of six restaurants in a beach front setting. You’ve got time series data, natural language processing data, and computer vision data. Let’s have a look at all three and this data set, I have a link to it in the description. This data set exhibits seasonality and trend both. So that’s something that you have to be aware of as you’re trying to forecast into the future. Seasonality is the fact that you’re seeing this go up and down based on the month of the year. And if you zoom in further, you’ll see that there’s even seasonality by the week. Trend refers to that this whole thing is increasing gradually over time, especially if you look at the peaks. You have a bunch of different items that you are trying to forecast the sales for. And you’ve got historic data as it goes into the future. There you see a bit better kind of by week. You can see this product was clearly discontinued there, but you need to forecast when a product discontinues. What is that going to do to the rest? Is it going to cause other ones to fill in the gap or will that demand simply go away? The files are here. The primary one that you’re going to deal with is this one called sales train, which is your sales over time. You can see the dates here, the item, which are those items that we were just looking at, the price that it was sold for, and how many items it was sold for that day. These are all of the days and the items are unique to each restaurant. You don’t have multiple restaurants selling the same item. They’re very similar across some of the restaurants. The actual items are here. You have some information about them. They’re in tabular form. Each item is sold by a particular store or restaurant store ID. And the restaurants are here. For natural language processing, I recommend doing something maybe not with the restaurant names. You’re supposed to not that many of them, but the item names you could certainly use natural language processing to maybe extract some further information. There’s also computer vision, which are these pictures that were taken at the street, where the five restaurants are at, showing the number of people there. Secues, something like yellow or other deep learning packages, computer vision packages to count. How many people? Both are on the beach and on the street, because those tell you different things. I did run a Kaggle competition with this data set and some of my students at Warshu. You can see the root mean square errors that these teams were able to accomplish. And some of their code is in the code tab. I’ll also put a link to the Kaggle competition that I ran, Kaggle Community Competition. So in this data set, one of the examples that I provided, you’ve got the complete notebook here, looks at doing some basic exploratory data analysis on this data set. You can run this notebook completely within Kaggle and look at the results in real time. We’re looking just really at the sales and tabular data here. We’re not looking at the computer vision aspect, the images from the webcam that looks at the four restaurants. So what we’re doing here is we first load in the data sets, just normal pandas, operations there, nothing too unusual. I do convert the date on the sales table so that it’s an actual date so that we can do some date specific things with it. I’m also going to add in the weekday. So to do this, we are actually doing the same conversion I had there, but it’s, so that’s slightly redundant. But here I put in the weekday and I convert sales.date.to an actual date and I get the day name that’s going to put in like Sunday, Monday, Tuesday, etc, etc. And then I’m going to get the week that it occurred because there’s some seasonality by week. So this is week number zero up through 51 for the 52 weeks in the year. We’re going to begin by producing just a basic line chart. And this is using plotly. One of the things I really like about plotly is in Jupyter Notebooks, you have these little controls available up here. So you can actually zoom in and you can see that there’s additional seasonality going on by the weeks and not just by the months and the part of year that you see here. Now we’re using plotly express. Plotly express is very quick way to get pandas, data frames, plotted. And for most basic charts, plotly express is probably what you want to be using. For some of my word, advanced visualizations, I will use plotly more lower level. Without the express, I use the data objects that they provide. I’ll probably do another video on those. I’m really starting to like plotly. And what we do here is we extract just the columns that we’re reporting on. So the date and the item count. So how many items we sold for that particular date? We’re grouping by the date and summing. So there’s a bunch of different items that are being sold each day. We don’t care. We’re taking the sum total of all. We just want to know total sales over time. And you can see seasonality very clearly. It’s bouncing up and down. Almost signed a saddle. I mean, a sine wave like, but not quite. And it’s also exhibiting a trend. You can see that the peaks get higher and higher and higher. The thing about a trend is there might be multiple trends, but one individual trend is monotonic. It’s simply going up or it’s simply going down. And often you’ll deal with those kind of as linear regression because it is quite linear. If you drew a line through this, I guess something like that. So you would probably want to figure out the slope and intercept and that could be part of how you’re dealing with the trend anyway. The seasonality, there’s other techniques to deal with those that we’re going to see soon in future videos. Next, I’m going to do just a bar chart. I want to see by the day of the week is there weekly seasonality and absolutely there is the weekend. The people like to buy a lot of stuff during the weekend. One day, they’re kind of taken off Monday. They’re kind of taken off from buying things. The day order is quite important because you want these days of the week in the natural order that they occur. You don’t want the biggest one first or anything like that. So this is how we literally tell, plotly, the order that these should go in. We’re doing another group by, very similar group by to before. We’re doing, except we’re doing weekday instead of date and the item count and grouping it on the weekday and taking the total sum. We also reset the index just so that it’s flat that you can deal with it after the group by. Then we create the plot again using plotly express and we are plotting based on the, so to plot it, we’re going to create a special pandas data frame just for the plot, which is going to be the weekdays group. So there’s just going to be seven rows in this and we order them by the day order. So that’s where that day order comes in and we reset the index so that it’s flat. Then we produce the actual graph where x is weekday and y is item count and there you see it. If you didn’t order by the dates, these bars would be in a somewhat random order. We can also look at it by a single year. Here I’m just looking at 2020. This is just basically, this is just basically doing a plot, a paint. This is just basically doing a pandas query for that particular year. And then we are creating a data frame just to plot with. That’s a very common technique in plotly where we’re getting the date and the item count group by date of that year. And then we plot it just like we did the previous line graph. Now you’ve just got one year. So you can see seasonality wise during the summer, it gets hot in terms of sales anyway. Also by, by temperature, we can pull out the most popular items. So we can get the unique item IDs. They simply range from one to 100. No particular order here. And then we get a table that shows us each item ID and the item count. To do this, we are creating another data frame with item ID and item count. We’re grouping by the item, summing and resetting the index so that we can display it as a single level. We sort the values by the item count ascending. So that you have the bigger ones first. And then we extract out just what we want to want to have the ID in the name, rename them. And then we also merge it with the items table, the merging with the items table gets us this actual name. So the name of each of these items. Now we’d like to take a look at some plots by next we’d like to look at some actual plots by the individual items. So what we’re doing here is creating another data frame for plotting and I am getting the date, the item ID as well and the item count. And then I’m grouping it by the date and the item ID, summing it and resetting the index to flatten it. And you can see and just so that I can display the names here, I also merge in the item table. And then I plot a bar chart. And you can see the results here. You can see the individual items varying in real time as we go across. It might look a little bit better if we can get the weekly seasonality out of there because that’s creating a lot of noise here in terms of what you’re what you’re seeing. So to do that, we can do the week that we put in there earlier. So the week number 0 through 51 for the 52 weeks in a year. We put the item count and then we group by the week number and the item ID. And then we merge in the item so that we can display the names here. And now we plot it. And this looks a lot better. This is by the week, not by the actual day. So a lot of the noise from the weekly seasonality is now gone. And you can really see some things that happen. Like here, the frozen melky smoothie gets discontinued. And it takes it a little bit of time, but the green one here, the sweet frozen soft drink, sounds similar, picks up the slack and fills in the void there. You can do a similar plot by the days of the week. We just put in the days of the week, the order that we want those in. Again, that’s important. And we’re just introducing the item ID now. And we are specifying the order by the days of the week, merging in the items so that we can put the item names on there. And we plot a bar chart. And you can see that there. So this is some basic EDA exploratory data analysis that you might do on demand forecasting. There’ll be more videos in the series. So please subscribe if you’re interested. And give this video a like. And this was useful to you.

AI video(s) you might be interested in …