#10 Machine Learning Specialization [Course 1, Week 1, Lesson 3]
Let’s look in this video at the process of how supervised learning works. Supposed learning algorithm would input a dataset and then what exactly does it do and what does it output that’s find out in this video. Recall that a training set in supervised learning includes both the input features such as the size of the holes and also the output targets such as the price of the holes. The output targets are the right answers to the model we’ll learn from. To train the model you feed the training set, both the input features and the output targets to your learning algorithm. Then your supervised learning algorithm will produce some function. We’ll write this function as lowercase f where f stands for function. Historically this function used to be called a hypothesis, but I’m just going to call a function f in this class. And the job of f is to take a new input x and output an estimate or prediction which I’m going to call y hat and is written like the variable y with this little hat symbol on top. In machine learning the convention is that y hat is the estimate or the prediction for y. The function f is called the model. x is called the input or the input feature and the output of the model is to prediction y hat. The model’s prediction is the estimated value of y. When the symbol is just a little y, then that refers to the target which is the actual true value in the training set. In contrast, y hat is an estimate. It may or may not be the actual true value. Well if you’re helping your client to sell their hulls, well the true price of the hulls is unknown until they sell it. So your model f given the size, outputs a price which is the estimated that is the prediction of what the true price will be. Now when we design a learning algorithm, a key question is how are we going to represent the function f? Or in other words, what is the math formula we’re going to use to compute f? For now let’s stick with f being a straight line. So your function can be written as f subscript w comma b of x equals, I’m going to use w times x plus b. I’ll define w and b soon but for now just know that w and b are numbers and the values chosen for w and b will determine the prediction y hat based on the input feature x. So this f w b of x means f is a function that takes x’s input and depending on the values of w and b, f will output some value of a prediction y hat. As an alternative to writing this f w comma b of x, I’ll sometimes just write f of x without explicitly including w and b in the subscript. It’s just a simple notation but it means exactly the same thing as f w b of x. Let’s plot the trading set on the graph where the input feature x is on the horizontal axis and the output target y is on the vertical axis. Remember the average variance from this data and generates a best fit line like maybe this one here. This straight line is the linear function f w b of x equals w times x plus b or more simply we can drop w and b and just write f of x equals w x plus b. Here’s what this function is doing is making predictions for the value of y using a straight line function of x. So you may ask why are we choosing a linear function where linear function is just a fancy term for a straight line instead of some non-linear function like a curve or a parabola. Well, sometimes you want to fit more complex non-linear functions as well like a curve like this. But since this linear function is relatively simple and easy to work with, let’s use a line as a foundation that will eventually help you to get to more complex models that are non-linear. This particular model has a name, it’s called linear regression. More specifically this is linear regression with one variable where the phrase one variable means that there’s a single input variable or feature x namely the size of the horse. Another name for a linear model with one input variable is uni-variate linear regression where uni means one Latin and where variate means variable. So uni-variate is just a fancy way of saying one variable. In a later video you also see a variation of regression where you want to make a prediction based not just on the size of a horse but on a bunch of other things that you may know about a horse such as number of bedrooms and other features. And by the way when you’re done with this video there is another optional lab you don’t need to write any code. Just review it, run the code and see what it does that will show you how to define in Python a straight line function. And the lab will let you choose the values of W and B to try to fit the training data. You don’t have to do the lab if you don’t want to but I hope you play of it when you’re done watching this video. So that’s linear regression. In order for you to make this work one of the most important things you have to do is construct a cost function. The idea of a cost function is one of those universal and important ideas in machine learning and is used in both linear regression and in training many of the most advanced AI models in the world. So let’s go on to the next video and take a look at how you can construct a cost function.