Stanford CS221 AI Lecture 1: Overview

In this AI video ...

All right, let’s get started. Please try to have a seat if you can find a seat. Let’s get the show on the road. Welcome everyone to CS21. This is artificial intelligence. If you’re new to Stanford, welcome to Stanford. First let’s do some introduction. I’m Percy. I’m going to be one of your instructors teaching this class and I’m going to be there. So if Doris wants to say hi, stand up. Hi, Doris. I’ll be teaching this class in Percy. I work at the White, student of all directions. Super excited about the student class. I’m looking at all the things. Great, so we’re going to be trading off throughout the quarter. And we also have a wonderful teaching team. So these are your CA’s. So if all the CA’s could stand up, and I’ll give you each person an opportunity to say three words about what you’re interested in. Let’s start with the head. Hello? My name is David. I’m a PhD student and I’m interested in not totally good processing. Yeah. Hi, buddy. I’m a second year Master’s student. I’m in the machine learning and data mining. Hi. I’m second year Master’s student. My name is StiDien. I’m a second year Master’s student. I’m a first year Master’s student and I’m interested in getting a vision. Hi. I’m also a second year Master’s student. I’m a first year Master’s student and I’m interested in processing. Everyone. I’m a first year Master’s student. I’m interested in this limestone’s production price and I’m interested in re-porting him at AI. Hi. I’m Jamie. I was over there. I’m here to meet you. And I’m here to meet you. Hi, everybody. I’m also a coach and I’m interested in helping you. I’m really happy to meet you. Hi, everybody. I’m Hasha’s coach and I’m interested in training for the game. Great. And I’ll see you in the back. Well, they’re all on the slide. Okay, so as you can see, we kind of have a very diverse team. And so when you’re thinking about kind of final projects later in the quarter, you can tap into this kind of incredible resource. So three quick announcements. So there’s going to be a section every week, which will cover both kind of review topics and also advance topics. So this Thursday, there’s going to be an overview. If you’re kind of rusty on Python or rusty on probability, come to this and we’ll get you up to speed. The first homework is out. It’s posted on the website. It’s due next Tuesday at 11 p.m. So remember the time that matters. All the submissions will be done on grade scope. There’s going to be a grade scope code that will be posted on Piazza. So look out for that later. Okay, so now let’s begin. So when I first started teaching this class seven years ago, I used to have to motivate why AI was important and why if you study it, you’ll have a lot of impact in the world. But I feel like I don’t really need to do this. Now it’s kind of inescapable that you pick up the news in the morning. You hear something about AI. And indeed, we’ve seen a lot of success stories, right? AI’s that can play Jeopardy or play Go, Dota 2, even poker, all these kind of games add super human level performance. It can also read documents and answer questions, do speech recognition, face recognition, even kind of medical imaging. And all these tasks are, you read about how successful these technologies have been. And then if you take a look at outside the kind of the technical circles, there’s a lot of people in policy and trying to ask what is going on with AI. And you hear about these kind of very broad claims of how transformative AI will be to the future of work and the society and so on. And even some kind of boring on pretty, you know, catastrophic consequences. So what’s going to happen in the future? No one knows. But it is fair to say that AI will be transformative. But how do we get here? And to do that, I want to take a step back to the summer of 1956. So the place was Dartmouth College, John McCarthy, who was then at MIT, and then after that he founded the Sanford AI Lab, organized a workshop at Dartmouth College with some of the best and brightest minds of the time, Marvin Minsky, Claude Shannon, and so on. And they had this not-so-moderate goal of trying to think that every aspect of learning or any future of intelligence can be precisely captured so that a machine can be just simulated. So they were after the big question of how do you kind of solve AI? So now they didn’t make that much progress over the summer, but a lot of programs and interesting artifacts came about from that time. There were programs that could play checkers or prove theorems, and sometimes even better than what the human proof would look like. And there was a lot of optimism. People are really, really excited. And you can see these quotes by all these excited people who proclaimed that AI would be solved in a matter of years. But we know that didn’t really happen. And there’s this kind of folklore example. People are trying to do machine translation. So you take an English sentence like the Spirit is willing by the Flushes Week. You translate into Russian, which is what the choice language by the US government at that time. And you could translate back into English. And this is what you get. The vodka is good, but the meat is rotten. So the government didn’t think that was too funny, so they cut off the funding. And it became the first AI winter. So there’s a period where AI research was not very active and was not well funded. So what went wrong here? These were really smart people. They just got a little head of themselves. So two problems. One is that the compute was simply not there. It was millions or even billions of order magnitude compared less than what we have right now. And also the problems, the way they formulate them intrinsically rely on an exponential search, which no matter how much compute you have, you’re never going to win that race. They also have a limited information. And this is maybe a kind of a more subtle point that if I gave you infinite compute and asked you to translate, I don’t think you would be able to figure it out because it’s not a computation problem. You just need to learn the language and you need to experience all the subtleties of language to be able to translate. But on the other hand, AI wasn’t soft, but a lot of interesting contributions to computer science came out of it. Lisp was, I had a lot of ideas that underlay many of the high level program languages we have, garbage collection, time sharing, allowing multiple people to use one computer at the same time, something that we kind of take for granted. And also this paradigm of separating what you want to compute, which is modeling and how you do it, which is inference, which we’ll get to a little bit later. Okay, so people forget quickly. And in the 17s and 80s, there was a renewed generation of people getting excited about AI again. And this time it was all about knowledge, right? Knowledge is power. There were a lot of expert systems which are created. And the idea is that if you could encode experts knowledge about the world, then you could do kind of amazing things. And at the time the knowledge was encoded in generally a set of rules. And there were a lot of programs that was written. And you notice that the scope is much narrower now. The goal is to solve it all of AI, but to really focus on some choice problems like diagnosing the diseases or converting customers, order parts into parts. And this was the first time that AI, I think, really had a real impact on industry. So people were actually able to make useful products out of this. And knowledge did actually play a key ingredient in curbing this in your exponential growth that people were worried about. But of course, it didn’t last long. Knowledge as deterministic rules was simply not rich enough to capture all the kind of nuances of the world. It required a lot of manual effort to maintain. And again, a pattern of overpromising and underliberalings that seems to plague AI people led to the collapse and of the field and the kind of second AI winter. Okay, so that’s not the end of the story either. But actually it’s not kind of really the beginning either. So I’m going to step back further in time to 1943. So what happened in 1943? So there was a neuroscientist, McCullough, and the logician pits were wondering and marveling at how the human brain is able to do all these kind of complicated things. And they want to kind of formulate a theory about how this could all happen. So they developed a theory of artificial neural networks. And this is kind of you can think about the root of deep learning in some sense. And what was interesting is that they looked at neurons and logic, which are two things that you might not kind of necessary associated with each other and showed how they were kind of connected mathematically. And a lot of the early work in this era were around our artificial neural networks was about studying them kind of from a mathematical perspective. Because at that time the compute wasn’t there. You couldn’t really run any kind of training models or. And then 1969 something interesting happened. So there’s this book by Minskian pepper called Perceptrons. And this book did a lot of mathematical analysis. And it also showed that linear models, one of the results of many was showing that linear classifiers couldn’t solve the XR problem. The problem is, no way to think about the problem is basically given two inputs. Can you tell whether they’re the same or not or different. So it kind of not a shouldn’t be a hard problem, but linear classifiers couldn’t do it. And for some reason, which I don’t quite understand, it killed off neural nets research. Even though they said nothing about if you had a deeper network, what it could do. But it’s often cited that this book swung things from people who were interested in neural networks, the field of AI being very symbolic and logic driven. But there was always this kind of minority group who were really invested and believed in the power of neural networks. And they thought it was just kind of a matter of time. So in the 80s, there was a renewed interest. People kind of discovered or rediscover the back propagation algorithm, which allowed kind of a for a generic algorithm that could train these multi layer neural networks. Because single layer, remember, was insufficient to do a lot of things. And then one of the kind of the early success stories of Gianna Kern in 1989 applied a convolutional neural network and was able to recognize handwritten digits. And this actually got deployed by the USPS and was reading kind of zip codes. So this was great. But it wasn’t until this decade that this area of neural networks really kind of took off under the moniker deep learning. And Alex net in 2012 was kind of a huge transformation where they show gains on the kind of image net benchmark and overnight transform the computer community. AlphaGo as many of you know, many kind of other and they were kind of the rest of this history. Okay, so there’s this kind of two intellectual traditions. You know, the name AI has always been associated with the time, John McCarthy logical tradition, that’s kind of where it started. But as you can see, there’s also kind of this neuroscience inspired tradition of AI. And the two were kind of really had some deep philosophical differences and over the decades fought with each other kind of quite a bit. But I want to pause for a moment and really think about maybe they’re actually kind of deeper connections here. Remember McCullen Pits, they were studying artificial neural networks. But the connection was to logic, right? So from even in the very beginning, there is kind of this synergy that you know, some people can kind of often overlook. And if you take a look at AlphaGo, which if you think about the game of Go or many games, it’s a mathematically, you can write down the rules of Go and logic in just a few lines. So it’s a mathematically well defined logical logic puzzle in some sense. But somehow the power of neural networks allows you to develop these models that actually play Go really, really well. So this is kind of one of the deep mysteries that has kind of, I think is kind of opens a standard challenge in AI. As with any story, it’s not a full picture. And I want to point out in this slide that AI has drawn from a lot of different fields. Many of the techniques that we’re going to look at, for example, Maxim likelihood came from statistics or games came from economics, optimizations, gradient descent, came from, you know, in the 50s, completely unrelated to AI. But these techniques kind of develop in a different context. And so AI is kind of like, you know, it’s kind of like a New York City. It’s a melting pot where a lot of these techniques are kind of unified and applied to kind of interesting problems. And that’s what makes it, I think, really interesting because of the new avenues that are opened up by kind of unique combinations of existing techniques. Okay, so that was a really brief history of, you know, how we got here. Now I want to pause for a moment and think about, you know, what is the goal? What are AI people trying to do? And again, this is kind of, there’s two ways to think about this, which, and sometimes the conflation of these causes a lot of confusion. So I’d like to think about it as AI as agents and AI as tools. So the first view asks the kind of the scientific question of how can we create or recreate intelligence. And the second one asks, you know, how can we use technology to kind of benefit, you know, society? And these two are obviously very related and they have a lot of shared technical overlap, but, you know, philosophically they’re kind of different. So let me kind of explain this a little bit. So the idea with AI agents is, and this is, I think, a lot of what gets associated with AI, and especially, you know, with science fiction, that kind of portrayal certainly kind of encourages this kind of view where you’re human, where human beings. And what you do is you look in the mirror and you say, wow, that’s a must, that’s a really smart person. And you think, okay, what can humans do? That is, you know, so amazing. Well, they can see and they can perceive the world, recognize objects. They can grasp cups and drink water and not spill out. They can communicate using language as I’m doing to you right now. We know facts about the world, declarative knowledge, such as what’s a capital of France, and procedural knowledge, I call to write a bike. We can reason with this knowledge and maybe write a bike to the capital of France. And then really importantly, we’re not born with all of this. We’re born with basically nothing, none of these capabilities, but we’re born with the capacity and potential to acquire these over time through experience. And learning, it seems to be kind of this critical ingredient, which drives a lot of the success in AI today, but also with, you know, human intelligence is clear that learning plays such a central role in getting us to the level that we are operating at. So each of these areas has kind of spawned entire subfields and people in it are kind of wondering about how you can make artificial systems that have the language or the motor or the visual perceptual capabilities that humans have. But are we there yet? And I would like to think that we are very far. So if you look at the way that machines are having successful, it’s all with a narrow set of tasks and millions of billions of examples, and you just crunch a lot of computation, and you can really kind of optimize every any task you can come up with. Whereas humans are operating a very different region. They don’t necessarily do any one thing well, but they have such a kind of diverse set of, you know, experiences, can solve the diverse set of tasks and learn from each individual task from very few examples. And still it’s a kind of a grand challenge from a, you know, a kind of perspective, how you can build systems with this level of capability in that humans have. So the other view is AI tools. Basically you would say, okay, well, you know, it’s kind of cool to think about how we can recreate intelligence, but, you know, we don’t really care about making more things like humans. We already have a way of doing that. That’s called babies. So what instead we really like to do is not making something that’s like a human, but making systems that help humans, because, you know, after all we’re humans, I guess it’s a little bit selfish, but we’re in charge right now. And a lot of these, this view and a lot of the success stories in AI are really different from the things that you expect, you know, this humanoid robot to comment to your house and be able to do. Now, well, this is a project from a seven elements group. There’s a lot of poverty in the world and part of it is just kind of understanding what’s what’s going on. And they had this idea of using a computer vision on satellite imagery to predict things like, you know, GDP. So this is obviously not a task that, you know, our ancestors in Africa were like, you know, getting really good at. But nonetheless it uses convolutional networks, which is a technique that was inspired by, you know, the brain. And so that’s kind of interesting. You can also have another application for saving energy by trying to figure out when to cool data centers. As AI is being deployed in more kind of mission critical situations such as self driving cars or authentication, there are a few new issues that come up. So for example, there are this phenomenon called adversarial examples where you can take these cool looking glasses. You can put them on your face. And you can fool the computer. Of state of our face recognition system to think that you’re actually, you know, someone else. Or you can post these stickers on stop signs and get this state of our system to think that it’s a speed limit sign. So there’s obviously there’s clearly these are, you know, big problems if we think about the widespread deployment of AI. There’s also less catastrophically, but also pretty, you know, upsetting, which is biases that you many of you probably have read in the news about. So for example, if you take one lay, which is a language that doesn’t distinguish in this writing form between he and she, and you second Google translate, you see that she works in there, but he works as a programmer, which is encoding certain societal biases in the actual models. And one of the kind of important point I want to bring up is that, you know, how is machine learning and kind of working today? Well, it’s society exists. Society is generating a lot of data. We’re training on this data and kind of trying to fit the data and trying to mimic what it’s doing. And then using predictions on it. What could possibly go wrong? Right. And so, so certainly people, a lot of people have been thinking about how these biases are kind of creeping up as an open active area of research. Something a little bit more kind of sensitive is, you know, asking, well, these systems are being deployed to all these, all these people, whether they kind of wanted or not. And this actually touches on people’s livelihoods and actually impacts people’s lives in a serious way. So Northway was a company developed a software called Compass that tries to predict how risky, criminal risk score, how someone, how risky someone is essentially. And a public holiday, this organization realized, whoa, whoa, whoa, you have the system that, given that individual didn’t refund, is actually more, twice as likely a classified black, is incorrectly as, you know, non-black. So this seems pretty problematic. And then Northway comes back and says, actually, you know, I think we’re, I think we’re bringing fairer. Given a risk score of seven, we were fairer because 60% of widespread defendant and 60% of black’s refended. The point here is that there’s, there’s actually no solution to this in some sense that way. So people have, find a formulated, different notions of fairness and equality between how you predict a court on different kind of groups. But all you can have different notions of fairness, and which all seem reasonable from first principles, but mathematically they can be incompatible with each other. So this is, again, an open area of research where we’re trying to figure out a society, how to deal with the scheme and that machine learning might be using these kind of critical situations. Okay, so some reason for there’s an agents view, we’re trying to really kind of dream and think about how do you get these capabilities, like learning from very few examples that humans have into, you know, machines and hope maybe opening up a kind of a different set of technical capabilities. But at the same time, we really need to be thinking about how these AI systems are affecting the real world and things like security and biases and fairness all kind of show up. And it’s also interesting to note that a lot of the challenges in deployment of AI system don’t really have necessarily to do with, you know, humans at all. Humans are incredibly biased, but doesn’t mean we want to build systems kind of in our, that mimic humans and kind of inherit all the kind of the flaws that humans have. Okay, any questions about this, maybe pause for a moment. So let’s go on. So what I want to do next is give an overview of the different topics in the course. And the way to think about all of this is that in AI, we’re trying to solve really complex problems. And the world is really complicated. And by at the end of the day, we want to produce some software or maybe some hardware that actually runs and does stuff. Right. And so there’s a very considerable gap between these things. And so how do you even approach something like self driving cars or, you know, diagnosing diseases. You probably shouldn’t just like go sit down and determine on start typing because then there’s no kind of no overarching structure. So what this class is going to do is to give you one example of a structure, which will hopefully help you approach hard problems and think about how to solve them in a kind of more principled way. So this is a paradigm that I call the modeling inference and learning paradigm. So the idea here is that there’s three pillars, which I’ll explain in a bit. And we can focus on each one of these things kind of in turn. So the first pillar is modeling. So what is modeling? The model is taking the real world, which is really complicated and building a model out of it. So what is a model model is a simplification that is mathematically precise so that you can do something with it on a computer. One of the things that’s necessary is that modeling necessary has to simplify things and, you know, throw away information. So one of the kind of the, you know, the art is to figure out what information to pay attention to and what information to keep. So this is going to be important. For example, when you work on your final projects and you have a real war problem, you need to figure out. You can’t have everything and you have to figure out judiciously how to manage your resources. So here’s an example. If you want to, for example, build a system that can find the best way to get from point A to point B in the graph in the city, you can formulate the model as a graph where nodes are points in the city and edges represent ability to go between these points with some sort of cost on the edges. Okay, so now once you have your model, you can do inference and what inference means is asking questions about your model. So here’s a model. You can ask, for example, what is the shortest path from this point to this point. Right, and that’s because now you’re a model and it’s a mathematically well-defined problem. Now you can, it’s within the realm of, you know, developing algorithms to solve that problem. And most of the inferences being able to do these computations really efficiently. And finally, learning addresses the problem, where does this model come from? So in any, in a realistic setting, the model might have a lot of parameters, maybe it has millions of parameters, and how do you, if it, if it wants to be faithful to the real world, but how do you get all this information there? Manually, encoding this information turns out not to be a good idea. This is, in some sense, what AI from the 80s was trying to do. So the learning paradigm is as follows. What we’re going to do is specify a model without parameters. Think about it as a skeleton. So in this case, we have a graph, but we don’t know what the edge weights are. And now we have some data. So maybe we have data of the form people try to go from X to Y, and they took 10 minutes or an hour or so on. And then from this data, we can learn to fit the parameters of the model. We can assign costs to the edges that are representative of what the data is telling us. So now in this way, we can write down a model with our parameters, feed the data, apply a generic learning algorithm, and get a model with parameters. And now we can go back and do inference and ask questions about this. So this is the paradigm. And I want to really emphasize that learning is not, as I’ve presented it, is really not about any one particular algorithm like nearest neighbors or neural networks. It’s really a philosophy of how you go about approaching problems by defining a model and then not having to specify all the details but filling them in later. Okay, so here is the plan for the course. We’re going to go from low-level intelligence to high-level intelligence. And this is intelligence of the models that we’re going to be talking about. So first, we’re going to talk about machine learning. And like I’ve kind of alluded to earlier, machine learning is going to be such an important building block that can be applied to any of the models that we develop. So the central tenant and machine learning is you have data and you go to model. It’s main driver of a lot of successes in AI because it allows you to, in software engineering terms, move the complexity from code to data. So you have a lot of data which is collected in kind of a more natural way and a smaller amount of code that can operate on this data. And this paradigm has really been, it’s really powerful. One thing to think about in terms of machine learning is that it requires a leap of faith. So you can go through the mechanics of downloading some machine learning code and you train a model. But fundamentally, it’s about generalization. You have your data, you fit them all. But you don’t care about how it performs on that data. You care about how it performs on new experiences. And that leap of faith is something that I think gives machine learning its power. But it’s also a little bit at first glance, perhaps magical. It turns out you can actually formalize a lot of this using probability theory and statistics, but that’s kind of a topic for another time. So after we talk about machine learning, we’re going to go back and talk about the simplest of models. So a reflex model is this. So here’s a quiz. OK, what is this animal? OK, zebra, how did you get it so fast? Well, it’s kind of reflex. So your human visual system is so good at doing these things without thinking. And so reflex models are these are models which just require a fixed set of computations. So examples like our linear classifiers, deep neural networks. And most of these models are the ones that people in machine learning use. Models is almost synonymous with reflex in machine learning. And important thing that there’s no feet for it, it’s just like you get your input, bam, bam, bam, and here’s your output. OK, so that’s great because it’s fast. But there’s some problems that require a little bit more than that. Right, so for example, here’s another problem. OK, quick. Why to move? Where should you go? OK, there’s probably like a few of you who are like chess geniuses. But for the rest of us, I have no idea. And even a way who’s moving again. So, so in these kind of situations, we need something perhaps a little bit more powerful than a reflex. We need agents that can kind of plan and think ahead. So the idea behind state piece models is that we model the world as a set of states which capture any given situation like a position in a in a game. And actions that take us between states, which correspond to things that you can do in this game. So a lot of game applications following this category robotics, motion planning, navigation, also some things that are might not be you might think of planning as such as your generation. In natural language or generating image. You know, are can be cast in this way as well. So there’s three types of state based models, each of which we’ll cover in weeks of time. So search problems are the classic you control everything. So you’re just trying to find optimal path. There are cases where there’s randomness. For example, if you’re trying to go from point A to point B, maybe there’s traffic that you don’t, you know, don’t know about or in a game, there might be a dice that are a die which are road. And there’s a third category which are adversarial games, which is cases where you’re playing an opponent who’s actively trying to destroy you. So what are you going to do about it? So one of the games that we’re going to be talking about when we talk about games is a Pac-Man. And one of assignments is actually building a Pac-Man agent such as this. So while you’re looking at this, think about how, what are the states and what are the actions and how would you go about, you know, devising a strategy for Pac-Man to eat all the dots and avoid all the gross. So that’s something to maybe look forward to. There’s also going to be a competition. So we’ll see who ends up on top. Okay, so state-based models are very powerful, that you kind of have foresight. But some problems are not really most naturally cast state-based models. For example, you know, how many of you play Sudoku or have played it before? So the goal of Sudoku is to fill in these blanks with numbers so that every row of the column in 3×3 sub-lock has a digits 1 through 9, so it’s a bunch of constraints. And there’s no kind of sense in which you have to do in a certain order. Whereas the order in how you move in chess or something is, you know, pretty important. So these type of problems are captured by these variable-based models, where you kind of think about a solution to the problem as an assignment to the individual variables under some constraints. So constraints satisfaction problems were spent a week on that. These are hard constraints, for example, two people can’t be a person can’t be in the two places at once, for example. And there’s also Bayesian networks, which we’ll talk about, which are variable-based models with soft dependencies. For example, if you’re trying to track, you know, a car over time. These are the positions of the car, these variables represent the position of the cars, and these E’s represent the sensor readings of the position of the car at that particular position. And inference looks like trying to figure out where the car was given all this kind of noisy sensor reading. So that’s also going to be another assignment that you’re going to deal with. Okay, so finally, now we get to high level. So what is high level intelligence here? And I put logic here for a reason that you’ll see clear. Yeah, is there a question? No, but can you explain why it’s not a state-based model? Yeah, so the question is why is the so-called problem not a space model? You can actually formulate this as a state-based model by just thinking about the sequence of assignments. But it turns out that you can formulate in a kind of more natural way as a variable-based model, which allows you to take advantage of some kind of more efficient algorithm to solve it. And think about these models as kind of different analogy is like a programming language. So yes, you could write everything in C++, but sometimes writing in Python or SQL for some things might be more easier. Yeah. Yeah. So the question is how do you categorize state-based models where there’s both randomness and an adversary? We’re also going to talk about those as well. And those would be, I would classify them as adversarial, but there’s also a random component that you have to deal with. Games like that, I don’t know. Yeah, question. Are we just trying to go from discrete states to continuous spaces? So it feels like we have problems that are quantifiable via like these spaces. And then the farther right we get, the more we need the confidence. Yeah. So the question is about whether some of these are more continuous and some of them are discrete. I don’t necessarily think of, so a lot of the reflex models actually can work in continuous states spaces, for example, images. Actually, it’s almost a little bit of an opposite where the logic based models are in some sense more discrete, but you can also have continuous elements in there as well. So in this class, we’re mostly going to focus on time discrete objects because they’re just going to be simpler to work with. Okay, so what is this logic? So the motivation here is that suppose you wanted a little companion who you could boss around and help or help you do things. Let’s say that’s a better way to say it. So you like to be able to say, okay, tell us some information. And then later you want to be able to ask some questions and have the system be able to reply to you. So how would you go about doing this? One way you could think about is building a system that you can actually talk to using natural language. Okay, so I’m actually going to show you a little demo, which is going to come up in the last assignment on logic. And, well, let’s see where you think of it. Okay, so this is going to be a system that is based on logic that I’m going to tell the system a bunch of things and I’m going to ask some questions. So I want you all to follow along and you see if you can play the role of the agent. Okay, so I’m going to teach you a few things. It’s like Alice is a student. Okay, so I learned something. Now let’s quiz. Is Alice a student? Okay, so that worked. Is Bob a student? Should answer you. I don’t know who’s Bob. Okay, so now let’s do students are people. Alice is not a person. I’m by that. Okay, so, okay, it’s, you know, it’s doing some reasoning, right? It’s using logic. It’s not just, okay, so now let’s do Alice is from Phoenix. Phoenix is a hot city. I know because I live there. Cities are places. And if it is knowing, it is, then it is cold. Okay, got it. So is it snowing? I don’t know. So how about this? Okay, so if a person is from a hot place and it is cold, then she is not happy. Okay, true, right? I guess those of you who are sent all your live in California would have maybe appreciate this. Okay, so is it snowing now? How many say yeah, it’s snowing. How many say no? Don’t know. I just want to see it. How about if I say Alice is happy? Okay, so is it snowing now? No, I should be no. Okay, so you guys were able to do this. Okay, so this is kind of an example of an interaction, which if you think about it has, is very different from what you would see kind of in a typical, you know, ML system where you have to show it. You know, millions of examples of one particular thing and then it can do kind of one task. This is much more of a very open ended set of, I want to say that the experiences are super rich, but they’re definitely diverse. I teach, I just give one statement. I say it once and then all of a sudden has all the ramifications and kind of consequences that built in and it kind of understands in a kind of deeper level. Of course, this is based on logic systems, so it is brittle, but this is kind of just a proof of concepts to give you a taste of what I mean when I say logic. So, so these systems need to be able to digest these heterogeneous information and would reason deeply with that information and we’ll see kind of how logic systems can do that. So, that completes the tour of the topics of this class. Now, I want to spend a little bit of time on course logistics, so I want to, all of the details here are online, so I’m not going to be complete in my coverage, but I just want to give you a general sense of what’s going on here. Okay, so what are we trying to do in this course? So, prerequisites, there’s programming, discrete math and probability. So, you need to be able to code, you need to be able to do some math and some kind of basic proofs. So, these are the classes that are required or at least recommended that you, or if you have some equivalent experience, that’s fine too. And what should you hope to get out of this course? Right, so one had the course is meant to be giving you a set of tools. Using the modeling inference learning paradigm, it gives you a set of tools and a way of thinking about problems that hopefully will be really useful for you when you go out in the world and try to solve your world problems. And also, as a site product, I also want all of you to be more proficient at math and programming because those are kind of the core elements that are enabled you to kind of interesting things in AI. So, a lot of AI, you read about it is very flashy, but really the foundations are still just math and programming in some sense. Okay, so the coursework is homework exam in a project, that’s what you have to do. homework, there’s eight homeworks, each homework is a mix of written and programming problems centered on a particular application covering one particular type of model essentially. Like I mentioned before, there’s a competition for extra created, there’s also some extra credit problems in the homeworks. And when you submit code, we’re going to run, we have an auto grader that runs, it’s going to run on all the test cases, but you get a feedback only a subset so you can, it’s like, you know, a machine learning, you have a train set and you have a test set, so don’t train on your test set. Okay, so the exam is testing your ability to use the knowledge that you learn to solve new problems. Right, so there’s, and I think it’s worth taking a look at exam because it’s kind of surprises people every, the exam is a little bit different than the types of problems that you see on, on the homework, and there are kind of more problem solving. So the exam isn’t going to be like a multiple choice, like, okay, you know, when was, you know, perceptron’s publishers, you know, something like that, it’s going to be, here’s a real problem, how do you model it and how do you come up with the solution? They’re all going to be written, it’s closed book, except for you have a one page of notes, and this is a great opportunity to actually review all the material and actually learn the content of the class. So the project, I think, is a really good opportunity to take all the things that we’ve been talking about in the class and try to find something you really care about and try to apply. Working groups of three, and I really recommend finding a group early, and it’s, I emphasize, it’s your responsibility to find, you know, a good group, right? Don’t come to us later, like, one week before the project did mine and say, oh, you know, my group members, they, they ditch me or something, really try to, try to nail this down, use piata to, or your other, social networks to find a good group. So throughout the quarter, there’s going to be these milestones for the project, so to prevent you guys from procrastinating into the very end. So there’s going to be a proposal where you try and brainstorm some ideas, progress, support, a poster session, which is actually a whole week before the final report is due. And the project is very open, so this can be really liberating, but also might be a little bit daunting. We will hopefully give you a lot of structure in terms of saying, okay, how do you define your task, how do you implement different, um, baseline zoracles, which I’ll explain later, how do you evaluate, how do you, um, analyze what you’ve done. And each of you will, each project group will be assigned a C.A. mentor, um, to help you, uh, through the process. And you’ll always welcome to come to my office hours or courses or any of the C.A.s to get additional help, either brainstorming or figuring out what the next step is. Um, some policies, uh, all assignments will be submitted on grade scope. Um, there’s seven total late days. You can use a most two per assignment after that. There’s no credit. Um, um, um, we’re going to use piata for all communication. So don’t email us directly, leave a post on piata. If I encourage you to make a public, if it’s, um, it’s not sensitive, but if it’s, you know, personal, then obviously make a private. Um, and try to help each other. We’ll actually award some edge credit for students who help answer, um, you know, other students questions. So all the details are on the course of course. Okay, so one last thing and it’s really important and that’s the honor code. Okay, so especially if you’re, um, you know, you’ve probably heard this. If you’ve been at Stanford, if you haven’t, then I want to really make this clear. So I encourage you all to kind of collaborate, discuss together. But when you, when it comes to actually the homeworks, you have to write up your homework and your code independently. So you shouldn’t be looking at someone’s write up. You shouldn’t be looking at their code. Um, and you definitely shouldn’t be copying code off of GitHub. Um, um, that’s hopefully should be obvious. And maybe less obvious you should not please do not post your homework assignments on GitHub. I know you’re probably proud of the fact your impact management is doing really well, but, um, please don’t post on GitHub because then that’s going to be an honor code violation. Um, when debugging, um, with, if you’re working together, it’s fine to, as long as it’s kind of looking at input output behavior. So you can say to your partner, hey, I put in this input to my test case. I’m getting like three. What are you getting? So that’s fine, but you can’t, um, remember, don’t look at each other’s code. Um, and to enforce this, we’re going to be running Moss, which is a software program that looks for code duplication. Um, to, to make sure that the rules are being followed. And, you know, changing one variable name is, or you, you’d be so, anyway, enough said. Just don’t, don’t, don’t do it. Okay? Any questions about this? I want to make sure this is important, or about any of the logistics. Yeah. The final project, uh, you can put on GitHub. Yeah. Uh, yeah. Yeah. Yeah. Yeah. Private GitHub repose, uh, is fine. Yeah. Question in the back. Necessary to have a group, or can you do a solo project? Uh, the question is, can you, can you do a solo project? You can do a solo project. You can do a project with two people. You can do a project with three. I would encourage you to try to work in, uh, groups of three, because you’ll be able to do more as a group. And there’s definitely, uh, you know, it’s not like if you do solo project, we’ll be expecting like one third of the work. So. Okay. Anything else? All right. Okay. So in the final, um, section, I want to actually delve into some technical details. Um, and one thing we’re going to focus on right now is, um, the kind of inference and learning components of, of this course. So I’m going to talk about how you can approach these through the lens of, you know, optimization. So this is going to be, um, it might be a review for some of you, but hopefully it’s going to be a good, um, you know, way to get everyone on the same page. Okay. So what is optimization? There’s two flavors of optimization that we care about. There’s a discrete optimization where you’re trying to find the best, uh, discrete object. For example, you’re trying to find the best, uh, path, or some, the path P that minimizes the cost of that path. Um, we’re going to talk about one algorithmic tool. Um, based on dynamic programming, which is a very powerful way of solving these, um, complex optimization problems. Um, and the key, you know, property here is that the set of paths is huge, and you can’t just, uh, try all of them and compute their cost and choose the best one. So you can’t have to do something clever. The second brand of optimization is continuous optimization. And formally, this is just finding the best of vector of real numbers that satisfies or minimizes some objective function. So a typical place this shows up is in learning where you define, uh, objective function like the training error. And you’re trying to find a weight vector w. So this notation just means it’s a list of numbers, D numbers, that minimizes the training. And we’re going to show that gradient descent is, uh, uh, you know, easy and a surprisingly effective way of solving these, um, continuous optimization problems. Okay, so to introduce these two ideas, I’m going to look at two, um, problems and try to kind of work through them. So this might be also a good, um, you know, way to think about how you might go approach, uh, you know, homework problems. You know, trying to kind of talk through this, um, in a bit more detail. Okay, so the first problem is, um, you know, computing at this distance. Um, and, you know, this might not look, you know, like an AI problem, but a lot of, uh, AI problems have this as kind of a, you know, building block if you want to do some sort of matching between, um, you know, two words or two biological sequences. So the input is you’re given two strings. Um, we’re going to start writing over here on the board just to work this out. So you’re given two strings, um, S and T. Um, so for example, um, a cat and, um, the cats. Okay, so these are two strings and you want to find the minimum number of edits that is needed to take transform S into T. And by edits, I mean, you can, uh, insert, um, a character like you can insert S, you can delete characters. I can delete this a and you can substitute one character for another so you can replace this a with a T. Okay. Um, so here’s some examples. What’s the edit distance of cat and cat at zero? You don’t have to do anything. Cat and dog is three. Cat and add is one. You insert the a or insert insert the C. Um, cat and cats is one, um, and a cat and the cats is four. Okay. So the challenge here is that there are, uh, quite a different number of ways to insert and delete. Right. So you have a string of that’s very long. There’s just way too many things to like just try out all them. Um, okay. So then how do we, how do we go about, um, coming up with a solution? Any ideas? Yeah. Yeah. Yeah. Yeah. So let’s try to simplify the problem a bit and building up on your what you, um, what was said. So, um, one thing to note is that, okay, we’re, so the general principle, let me just write the general principle. Um, is to reduce the problem to a simpler problem. Um, because then you can hopefully solve it is easier to solve and then you can maybe keep on doing that and get something that’s trivial. So there’s maybe two observations we can make. One is that, well, we’re technically saying we can, um, you know, insert into s, right. But if we insert into s, it makes the problem kind of larger in some sense, right. I mean, that’s not, that’s not good. That’s not reducing the problem. But, but when we’re really inserted into s, um, we probably want to insert things which are in T. We want to cancel something out, right. So we want to insert a K there for any reason. We probably want to insert a s in which case, you know, s matches that and then we’ve reduced that problem, right. So we can actually think about, you know, inserting into s to s as equivalent to kind of deleting from, um, from T. Does that make sense? All right. So another observation we can make is that, um, you know, we can start inserting anywhere. We can start inserting here and then jump over here and do this. But this just introduces a lot of, you know, ways of doing it, which all kind of result in the same answer. So why don’t we just start more systematically at one end and then just proceed and try to chisel off the problem. Kind of let’s say from the end. Okay. So, um, start at the end. Okay. So now we have this problem. I’m going to draw a problem in a box here. Um, so let’s start at the end. Yeah, question. How we used to reach that principle started the end? Uh, the question is why are we starting at the end as opposed? Well, um, the idea is that if you started the end, then you have kind of a more systematic and consistent way of, you know, reducing the problem. So you don’t have to think about all the permutations of where I can, you know, delete and substitute. Why is more systematic? The right to the left than from the left to the right? We can also do the left to right. So the end or the start, um, is both fine. This is just I just picked the end. Yeah. How do we know starting at one end can give us the optimum strategies? Yeah. The question is how do we know that starting, um, at one end can give you the optimal strategy. Um, so, you know, if you wanted to prove this more rigorously, there’s some work, but, um, I’ll just try to give you a, you know, into an answer. Um, suppose you didn’t start at the end and you just made a sequence of steps. Like I insert here, I delete here and then I went over here and, um, did all those operations to ask. I could have equivalently also just sorted those by, you know, where it was happening and then just proceeded from one end to the other and I would arrive at the exact same answer. So without lots of generality, I can start that. Any other questions? Okay. So yeah. So yeah. So the question is maybe you can recognize some patterns. It’s like, oh, cat. That’s a, that’s maybe those should be lined up. So that these patterns exist, but we want to solve the problem for cases where, um, the pattern might not be obvious. It could be, we want to work it to work for all strings. Maybe there is no pattern. And we still would want to kind of efficient algorithm to do it. Yeah. So we just do dynamic programming. Like we go one by one. There was always like these two steps either we’re doing substitution or, um, otherwise it’s like same character or we have to insert. Yeah. And then we keep going and we just like, remember each like to just drinks that we have at one point so that if we calculate it, that we don’t have to do it again. Yeah. Yeah. Yeah. Yeah. Yeah. Great idea. Let’s do dynamic programming. Um, so that’s where I’m kind of trying to build up from a build up to. Okay. So, um, so if you look at this, so dynamic programming is a kind of general technique that essentially allows you to express this more complicated problem in terms of simpler problem. So let’s start with this problem. If we start at the end, um, if the two match then, well, we can just immediately, um, you know, delete these two and it’s, it’s going to be the same. Right. So we can get, we can get some free rides there. Okay. But when they differ, um, now we have many options. So what we, what could we do? Well, we could, um, um, you know, substitute. Okay. We can change the T to a S. So what does that leave us with? So I can do a cat’s T is the cat, the, okay. So I can substitute. Okay. Um, what else can I do? Someone say something I can do. So I can insert, um, insert where into. So I can insert a S, right. But that’s the same as, you know, by one, deleting from T. So I can basically also just delete the SS. So this is a cat and I deleted this S from T. Okay. So this is, um, let’s call it, uh, you know, um, I guess let’s call this insertion. It’s technically insertion. And then finally what can I do? I can also remove T. So a, k, c, a, t, uh, a cat. Okay. So this is delete. And right now you’re probably looking at this like, well, obviously, you know, you should do this one. But in general, it’s hard to tell, right? I just gave you some arbitrary strings. You know, who knows what the right answer is. Um, so in general, how do you pick? Yeah. Yeah. The second one that he’s supposed to be the cat. You mean this one? So here I inserted a S, right. But then because there’s two S’s here, I just canceled them out. So you can think about this is really the lean from, um, considering. Like in the original problem, we said we’re trying to get. Yeah. Yeah. Yeah. So I’m because of this. I’m kind of trying to reframe the problem. Okay. So which one should I choose? Yeah. What about the substitution the other way? Um, the substitution the other way, meaning change. Sorry, there’s too many S’s and teams here, which. Um, that’s you can think about that is kind of equivalent. So if you identify two letters that you want to make the same, then you can replace one to be the other or the other to be. We officially we’ve been kind of framing it as we’re only editing S, which is the reason that it’s so I’m not. Okay. So which one of these? Doray, Dorby or Dorsey? Yeah. Would you look for some. Yeah. So you could try to look inside, but. Remember, these are might be really complicated. So we want to kind of a simple mechanized procedure to tell. The next letter. Um, yeah. Let’s pretend these are you can’t see inside. Yeah. Okay. So let’s keep on going. So I’m not going to drive anything, but you can also try to break this down into maybe there’s three actions here and three actions here. And at the end of the day, you hopefully have a problem that’s simple enough that. Where S equals to your something, then you’re done. Um, but then, you know, how do I, how do I know? Suppose I’ve solved this. Suppose someone just told you, okay, I know this cost. I know this cost. What, what should you do? Yeah, you should take the minimum, right? Like remember, we want to minimize the edit distance. So there’s three things you can do. Each of them has some cost of doing that action, which is, you know, one every edit is the same cost. And then there’s a cost of, you know, continuing to do whatever you’re doing. And so we’re just going to take the minimum. Yeah. We know that that’s like the optimal amount of movies that we have to take. Yeah. So I was trying to argue that with, if you’re going to write it left, it’s without lots of generality. Because if you went left to write or in some other order, you can also replay the edits. Yeah. One letter that you needed, one insertion, like, pop string. But if you went from like, left, it looks like as they’re all shipping over my one or not. Yeah. That would work. Yeah. I think it works. Okay. So, so let’s try to code this up and see if we can make this program work. Okay. So I’m going to do edit distance. Can everyone see this? Okay. So I’m going to define a function. It takes two strings. And then I’m going to define a recurrence. So recurrences are, I guess, one word I haven’t really used. But this is really the way you should kind of think about dynamic programs and this idea of taking complex problems and breaking it down. It’s going to show up in search problems, MDPs, and no games. So I guess it’s something that you should really be comfortable with. Yeah. Let’s define a recurrence as follows. So remember at any point in time, I have, let’s say a sub problem. And since I’m going right to left, I’m only considering the first M letters of S and the first letter N letters of T. Okay. So recurrence is going to return the minimum edit distance between two things, the first M letters of S and the first N letters of T. I’m going to post this online so you don’t have to copy, try to copy this. Okay. So, I’m going to define this function. If I had this function, what should I return? So, I’m going to return the length of M and the length of M. Okay. So that’s kind of the initial state. Okay. All right. So now I need to fill out this function. Okay. So let’s consider a bunch of cases. So here’s some easy cases. Suppose that M is zero. Right. So I have comparing an empty string with something that has N letters. So what should the cost of that be? I heard some long line. It should be N. And symmetrically, if N is zero, then result should be M. And then if now we come to the kind of initial case that we consider, which is the end back in match. So if S, the last letter of M, this is zero base indexing. So that’s why there’s a minus one. So this matches. Then what should I do? So now it reduces to a sub problem, right. So I have M minus one and minus one. And now comes the fun case, which we looked at. So there’s in this case, the last letter doesn’t match. So I’m going to have to do some sort of edit. Can’t just let it. Yeah. When you were doing a full S to compare or S through M and that’s true and compare. We’re doing a full essay compare. There’s probably a way you can make this more efficient. I’m just going to try to get that basic. So substitution. So what’s the cost of a substitution? I pay one to do the substitution. But as a reward, I get to reduce the problem to M minus one and M minus one. So I lock off a letter from S and I lock off a letter from T. So what else can I do? So I can delete. So that also costs one. And when I delete, I delete from S and then N. So this remains the same. And then now you can think about the insertion is N minus one, right. Because remember insertion into S is deletion from T. That’s why this is N minus one. And then the result is just going to be a minimum of all these things. Return result. Okay. So just. And then how do I call this function? A cat. The cats. So let me print out the answer. Let’s see if it works. Okay. For now for. Therefore, I conclude it works now. I mean, if you were doing this, you probably want to test this more, but in the end of the time, I’ll kind of move on. So let me just kind of refresh. Okay. So I’m computing this at a distance between two strings. And we’re going to define a recurrence that works on sub problems, where the sub problem is the first M letters of S and the first N letters of T. And the reason I’m using integers instead of strings is to avoid string copying implementation detail, but it doesn’t matter. So base cases. So you want to reduce your problem to a case where it’s trivial to solve. And then we have the last letter matches. And then we have the letter doesn’t match and you have to pay some sort of cost. I don’t know which action to take. So I’m going to take them no minimum of all of them. And then I call it by just calling the reverse. Okay. So this is great. Right. So now I have a working thing. Let’s try another test case. So I’m going to make this. So if I do times 10, this basically replicates this string 10 times. So it’s a long string longer string. Okay. So now I’m going to run it. And maybe I shouldn’t wait for this. There is a base case. I think it works. But it’s it’s what’s wrong with this code. Yes, it’s very slow. Why is it slow? Yeah, right. So I’m recursing every point recurses three times. So you kind of get this exponential blob. So there’s kind of a how do you solve this problem? Yeah, you can memo. I think I heard the word memo eyes, which is another way to kind of think about memo eyes plus, I guess, recurrences is dynamic program. I guess. So I’m going to show you kind of this way to do it, which is pretty uninvasive. And generally I recommend people, well, get a slow version working and then try to make it faster. Don’t try to be, you know, too slick at once. Okay. So I’m going to make this cache. Right. And I’m going to say if mn is in the cache, then I’m going to return whatever is in the cache. So the cache is just a dictionary mapping the key, which is identification of the problem I’m interested in solving and the result, which is the answer that I computed. So if I already computed it, I don’t need a computer again, just return it. And then at the end, if I have to compute it, then I have to put this in the cache. So three lines, four lines, I guess. Yeah. Yeah, that’s a great point. This should be outside of the recurrence function. Yeah. Glad you guys are paying attention. Otherwise, yeah, it would do basically nothing. Any other mistakes? Yeah. There are some function decorators that like implement them, I’m asking for you in this class, are you okay if we use that? Or would you rather us like make our own in this case? You can use the decor, you can be fancier if you want. Yeah. But I think this is pretty transparent, easy for learning purposes. Okay. So let’s run this. So now it runs instantaneously as opposed to I actually don’t know how long it would have taken otherwise. Okay. And sanity check for the is probably the right answer because there’s four was originally answered multiple by 10. Okay. Any other questions about this? So this is an example of, you know, kind of basic dynamic programming, which are you solve a problem trying to formulate as a recurrence of a complicated problem in terms of smaller problems. And like I said before, this is going to kind of show up over and over again in this class. Yeah. Yeah. So question is why does this reduce redundancy? Is that right? So maybe I can do it kind of pictorial way. If you think about, let’s say you have a problem here, right? And this gets, you know, reduced to, I’m just making kind of an arbitrary diagram here. So this problem gets reduced to these two. And this problem gets reduced to these two and so on. Right. So if you think about, if you didn’t have memorization, you would just be paying for the number of paths. Every path is the kind of you have to compute from scratch. Whereas if you do memorization, you pay in the number of nodes here, which a lot of this is shared. Like here, once you compute this, no matter if you’re coming from here or here, you’re kind of using the same value. Okay. So let’s move on. So the second problem we’re going to talk about is has to do with continuous optimization. And the motivating question here is how do you do regression, which is a kind of a bread and butter of machine learning here? So, so here we go. Regression. Okay. So imagine you get some points. Okay. So I give you a point, which is two four. And then I give you another point, let’s say four two. And so these are data points. You want to, let’s say, predict housing price from square footage or something like that. You want to predict health score from blood pressure and some other things. So this is pretty common in machine learning. And the question is how do you fit a line? I’m going to kind of consider the case where your line has to go through the origin just for simplicity. So you might want to like find a fit. I mean, two points is maybe kind of a little bit degenerate, but that’s the simple example we’re going to work with. In general, you have lots of points and you want to fit the line that best kind of is close to the points. Okay. So how do you do this? So there’s a principle called least squares, which says, well, if you give me a line, which is given in this case by a slope W, I’m going to tell you how bad this is. And badness is measured by looking at all the training points and looking at these distances. Right. So here I have, you know, this particular particular, let’s say point, you know, Xi. If I hit it with W, then I get basically the, you know, the Y intercept here, not the Y intercept, but like the Y value here. That’s my prediction. The real value was, you know, Y i, which is up here. And so if I look at the difference, I want a difference to be zero. Right. So in least squares, I square this and I say, I want this to be as small as possible. Right. Now, this is only for one point. So I’m going to look at all the points. Let’s suppose I have end points. And that’s a function that I’m going to call f of W, which basically says for a given weight vector, which is a slope, giving a number that characterizes how bad of a fit this is where zero means that I fit everything perfectly and large numbers mean that I fit point. Okay. All right. So, so that’s no regression. So how do I solve a regression problem? So how do I optimize this? Can you do this in your head? So if I actually have these two points, which should W be? Okay. Doesn’t matter. We’ll compute it. So how do we go about doing this? So one principle, which is maybe another general takeaway is abstract away the details. Right. This is also true with the dynamic program, but sometimes you know you get if you’re too close to bored and you’re looking at, oh man, these points are here and I need to fit this slide. You know, how do I do that? You kind of get kind of a little bit stuck. But why don’t we think about this f as say some function. I don’t really care what it is. Okay. So let’s plot this function. Okay. So look now this is a different plot now. This is the weight and this is f of w. Always label your exes. And let’s say this function looks like this. Okay. So which means that for this slope, I pay, you know, this amount for this slope, I pay this amount and so on. So what do I want to do? I want to minimize f of w, which means I want to find the w, which has the least value of f of w. Right. Question? Okay. So you take the derivative. So what is the derivative give you? It tells you where to move, right? So if you look over here, so you can’t in general, you might not be able to get there directly. In this particular case, you can because you can solve in close form, but I’m going to try to be more general. So if you start here, this derivative tells you, well, the function is decreasing if you move to the right. So then you should move to the right. Whereas over here, if you end up over here, the derivative says the function is decreasing if you move to the left. So you should move to the left. Right. So what I’m going to introduce is this algorithm called gradient descent. It’s a very simple algorithm. It basically says start with some place and then compute the derivative and just follow your notes. Right. So the derivative says it’s negative and just go this way. And now you’re at a new point, and you compute the derivative again, you descend, and now you compute it again. And then maybe you compute the derivative and says keep on going this way, maybe you overshoot, and then you come back, and then hopefully you’ll end up kind of at the minimum. So let’s try to see what this looks like in code. So gradient descent is one of the simplest algorithms, but it really underlies essentially all the algorithms that you people use in machine learning. So let’s do points. We have two points here. And I’m going to define some functions. Okay. So f of w. So what is this function? So I’m going to sum over all the different, you know, and basically at this point is converting math into Python. So I’m going to look at all the points. So for every x, y, what the model predicts is w times x minus y. And if I square that, that’s going to be the the error that it get on that point. And then if I sum over all these errors, then I get my objective function. Okay. A ray of, yeah, so you can put a ray here if you want, but it doesn’t. It’s actually fine. Okay. So now I need to compute the derivative. So how do you compute the derivative? So if your calculus is a little bit rusty, you might want to brush up on it. So what’s the derivative? Remember, we’re taking a derivative with respect to w, right? There’s a lot of symbols here. Always remember what you’re taking the river with respect to. Okay. The derivative of the sum is the sum of the derivative. So now I need to take the derivative of this. Right. And what’s the derivative of this? Well, something squared. You bring the two down here. And now you multiply by the derivative of this. And what’s the derivative of this? Should be x, right? Because this is a, why this is a constant and w derivative, w times x with respect to w is x. Okay. So that’s it. Okay. So now let’s do gradient descent. Let’s initialize with w equals zero. And then I’m going to just, you know, just iterate 100 times. Normally you would set some sort of stopping condition, but let’s just keep it simple for now. Okay. So for every moment, I’m going to have a w. I can compute the value of the function. And I also take the gradient of the derivative. Gradient just means derivative and higher dimensions, which we’ll want later. Okay. And then what do I do? I take w and I subtract the gradient. Okay. So remember. Okay. I’ll be out of here. So I take the gradient. Remember, I want to have the gradient. Gradient tells me where the function is increasing. So I want to move in the opposite direction. And a does just going to be this step size to keep you things under control. We’ll talk more about it next time. Okay. So now I want to do a pronoun what’s going on here. So iteration. I’ll put out the function and T value. Okay. Alright. So let’s compute the gradient. And so you can see that the iteration. We first start out with w equals zero. Then it moves to 0.3. And then it moves to 0.799999. And then it looks like it’s converging to 0.8. And meanwhile, the function value is going down from 0.7.0.2, which happens to be the optimal answer. So the correct answer here is point point. Okay. So that’s it. Next time we’re going to keep, we’re going to start on the machine learning lecture.

AI video(s) you might be interested in …