https://www.youtube.com/watch?v=aPY-KC6zeeI

F18 Lecture 1 : Introduction to Deep Learning

As always on Dropbox, because that’s the only place where you get these videos and set light to play from. Once you upload it to Google Drive, never place. That’s why I use Dropbox. Those of you who are sort of sitting at the back, you know, if you must sit on, please come to the front. You get to see something besides people’s backs. Yes, I think that should do. Okay. Now, we’re going to start like in all my classes, we’re going to start with that little, interesting little activity. Each of you is going to get a sheet of paper. A sheet of paper, please pass this around. It’s blank. And here’s what you’re going to do. I’m going to show this in advance. So, you know, we save some time and I can keep talking while you’re doing this. We’re going to first fold your piece of paper. This is how, you know, it’s all about deep neural networks. It’s very important. You first fold your paper into two perfectly like so. And then each half is going to be folded exactly the same way into two. And now it’s folded into four, but inwards. And I think you already know what I’m trying to tell you to do, right? So that’s like so. And then on one side, you’re going to take one of these pens and write your name out big. That is me. This sheet of paper will come with you to every class. And why? Because then this opens out beautifully like this. And you can stick it in front of you. And I will know who you are. And this is very important. Because I need to be able to address you by name as opposed to hei you in the blue shirt you with the yellow checks and the gray one. Right? So this is standard. I do this in all of my classes. So please write your name down. Use one of these pens. We have five excellent pens over here. They’re right nice and thick. Don’t use one of your flimsy little ballpoint pens because they won’t write very big. And if you just pass this, someone opens this and pass the pen round. And when you’re done, please get the pens back. Okay. Up here. Check the video. Yes. The pens are going round. Here’s one. And you guys on the ground, you probably should have your names on. If you run out of paper, well, you’re going to remain anonymous word. Thank you. Very kind of here. And if yes, can you pass the paper round? The next class is not going to be quite so cramped. On Friday, we are in POSNOR. But from the next week, we are moving to Gates 4307, which holds 70 people. So we should be more comfortable. Thank you. And just in case you don’t know who I am, you should be able to read this. So are we applying? Check this. Yeah. All right. Lovely. So when we get to this one, you’re going to have to play the video from it. All right. Let’s go to the first slide. All right. Morning, everybody. And welcome. Hi. Thank you. Your name can be read both upside down and upside off, except it’s kind of wrong. So welcome to the await. What am I welcoming you to? Welcome to something. Ah. Please bear with us when we get this sorted out. So this is pretty standard for the first class. We get ourselves, ah. Can I try the clicker? No. Do I have to turn the clicker on? Yeah, it is on. Oh, maybe you can just move the slides ahead. So OK, morning, everybody, and welcome again to the introduction to deep neural networks. The slide just says neural networks, because this business of deep neural networks is a bit tricky. You know, it makes it sound more important than it is, like my two microphones over here. And but it is an introductory lecture. So right? Next slide. Now, by now, the reason you guys are all here is because, you know, these deep neural networks is the, ah, is the latest, what do we say, big thing in AI. Neural networks, I’m not going to keep repeating the word deep neural networks in general, ah, have become one of the major thrust areas in various pattern recognition prediction and analysis problems lately. And in many problems, they’ve actually established the state of the art. So for example, if you go to the next slide, speech recognition, right? Until a few years ago, the idea that a computer could recognize speech as well as a human being can was considered kind of implausible. We’ve been working on this problem for over 50 years. And in spite of various claims about the state of the, about the performance of systems, we never really got anywhere close to human performance. And this was from 2016 when we finally had an announcement from Microsoft that their system had beaten human performance in one standard task. Afterwards, of course, I’ll stay here so the camera sees me, right? Afterwards, of course, systems have improved further. We’ve beaten human performance on a variety of tasks. Things have gotten better and better. Same thing with ah, information. If you were using Google Translate prior to November 2015, you’d have probably found that it wasn’t very good. So if you started with English, translated it to say Spanish, then took that Spanish, stuck it back into Google Translate and translated it back to English, what went in and what came out would often have no relation to one another. I mean, this was a running joke. You would use Google Translate to translate various things and send the funny stuff that came out around. And then in November 2015, something magical happened. So just went to their computer, began using it and the stuff worked. What had happened was overnight, Google switched over to the International Network Based Machine Translation Systems. And as it turns out, neural network based machine translation systems, the best system out on the web today are indistinguishables. Distinguishable from humans performing the translation, they are really, really good. Same thing with ah, image segmentation. Now this one, this picture I picked up from the web, so it must be true, right? Something that comes, you find on the web is true. Now this is supposedly an example of ah, of ah, neural network based system segmenting and classifying objects in a very complicated scene. And what you find, does this have a laser point? Yeah, so what you find is, this is a very complex scene, probably in Germany, given the ah, little cathedral out here. And ah, the system has actually managed to identify segment, segment out. This is a cool bit, cool bit, right? Ah, pretty much all of the interesting objects in the image. So here’s this guy’s bag and here’s the guy, here’s the guy, here’s a push card. Ah, here’s a human, this is a building of some kind, it’s found it, it’s, this is something else, it’s found it. And not only has it segmented out every interesting object in the image, it’s even assigned names to them. If you, I don’t, I don’t know if you’d quite realize the complexity of this task and how magical this is. And it’s so magical that I can’t even believe that this is true. Maybe this is just some propaganda ah, ah, slide picture that someone put up on YouTube or on Google for people like me to get fooled by and pull it down. But the fact that I think it’s plausible tells you something that deep neural networks these days are capable of ah, performance like this one. Ah, can you play the video? So this one is from site sound.com and you can see what they’re doing. This is live supposedly and they’re tracking ah, a highway. They’re segmenting out the cars on the, in the video and assigning and identifying the cars and they aren’t missing very many, right? Imagine how phenomenal this is. And this is being done by a deep neural network based system. Same thing with games for a very long time. I mean people kind of assume that ah, pattern recognition systems can do all kinds of fancy stuff. But that they never really, if you’re standing outside please come on in. Ah, and you can sit on the front. It’s better that you, yeah, yeah. So ah, it was assumed that intelligence is this magical hidden thing inside human beings and that ah, machines can’t really be intelligent. Until the late 80s, early 90s, it was assumed that you know, the people thought that chess was the limit. Chess really requires human intelligence and ah, machine systems didn’t really beat ah, humans a chess. You know where the first computer game that beat a human grandmaster was built? At the end of the year. At the end of the year. Are you sure? You’re sitting in the room where it was built, right? Ah, this was ah ah, ah, ah, at ah, this was Thomas Anantraman. Thomas PhD thesis and, and he didn’t even start this as a thesis on ah, you know, plain chess. You, I think if I recall correctly he was trying to design processes for dynamic programming. And he just say, hey, let me try this on a game. I want something really difficult and try this on chess. Next thing you know, he ah, beat a human grandmaster then went over to IBM and then beat Gary Kasperov and there was a, there was a scandal about it. The very first game that Kasperov played with the system, this was deep blue. There was a bug in the system. So it played wrong there at last. And then subsequently they went and fixed the bug and a thumped Kasperov for the next five games. And Kasperov came back and said, you know, they did this on purpose, they made me get put, get my guard down by having the system beat me in a lose to me in the first game. And that’s why it beat me. Well, that wasn’t true. Outwards, you of course, never beat any game in here. Real computer game. And I think these days he probably can’t beat his iPhone, right? But again, but that wasn’t using deep neural networks. People were trying to play chess with deep neural networks systems and one of the big names on trying to get this done was our own dean Andrew Moore back in the day. But then the next challenge came up, which was go. And go was now in a chess game, there are 10 raised to 120 possible game states. Which means that the state space is so huge, you thought that a computer couldn’t really beat it. Couldn’t really learn all about it. And then they showed that it can. So if you went up to a game, which was 10 raised to 40 times higher, the number of game states in go is about 10 raised to 160. So clear, you know, a computer’s never going to beat humans, right? Well, as of a few years ago, not only a computer’s beating humans, here’s wait. Can I go to the previous slide? Next one, go back to the go slide. Where are you going backwards? Go back. Go back. Yeah, yeah. So this was alpha zero. And in seven hundred, seven hundred learning. So this basically beginning to learn from scratch. It doesn’t even know the rules of the game. It learns about the rules of the game. It learns to beat the current best system in chess, which is stockfish, after about 300,000 steps. It learns to beat the current best-case system in Shogi, after about 200,000 steps. I don’t know what Shogi is. I don’t know what Go is. And learns to beat the current best system in Go, which was also neural network based, except the latest system learns to do this from scratch. In about 500,000 moves. Trust me, this is a game which has basically 500,000 moves in a game is really not a whole lot, right? And it’s probably going to take a few hours. And a few hours the system learns to be so good it’s going to beat every single human being on the planet at the game. So all of these, here’s another really fancy result with neural networks. Now this result is the latest result on this kind of stuff is much better. These are pictures and this is a man in a black shirt playing a guitar. This caption wasn’t written by a human being. That was produced by a computer. That was produced by a deep neural network. There’s a construction worker and audience safety vest working on the road, which is exactly what he’s doing again by computer, right? So now this is a really, really challenging thing. This is just assigning semantics to a picture. It’s an intensely human task. This is an artificial intelligence system being done performing this task. And it’s generated by a deep neural network. And so many other problems. Art, astronomy, healthcare, predicting stock markets, pretty much every single field which can take advantage of AI, the state of the art is being established by deep neural nets. So and can you play this video? This I put up just with a heck of it, right? Well, you know, this has got to be Indian, right? The rest of the world is building rockets that go to Mars. We learn to robots, stomachs, and pat our heads. Very important. But they actually have championships for this. You know, learning to rub your stomach and pat your head, doing complicated stuff at the same time in the world champions are all in India. A jobless people. You can actually build a robot that does this. And it would probably be powered by a deep neural network. And you can let’s move on. So this is the most important part of it, right? This stuff a few years ago, having familiarity with neural networks and how to work with them on your resume was a bonus. If you had this, you got a better job. These days, if you don’t have this on your resume and you call yourself a computer science grad, you’re going to be like this guy, right? You know, I work for food. You’re going to be jobless. So it’s not become, you know, a plus point on your resume, not having it as a negative. And so in this course, what we are really going to do is to learn all about deep neural networks. We’re going to learn to, you know, each and each of these tasks, there was a deep neural network model which performed all of these very fancy looking tasks. Hopefully by the end of this course, you know how they were built. Not only that, given the right resources, hopefully you will be able to build them yourself. The, this website, now if I were doing this on my laptop at Pull Up the Website, has maintains a list of all of the latest architectures in deep learning. And again, one of our objectives for this course is that by the end of the course, you would be familiar with all of these, or things be able to comprehend what these architectures mean and design them yourself. What you want, what won’t happen is that you won’t actually become a world-leading expert in the area in just one course. This is a fast developing field. There’s a ton of stuff happening every day. If you’re just following deep neural networks on archive, you’re not going to be able to keep up with the literature that happens every single day. So on any given day, you’re going to get dozens of new papers appearing, of which the majority are probably in consequential, but there are going to be papers which really matter. And they’re going to keep popping up every week, meaning, if you want to be thought of as one of the leading experts in the field, you’re going to have to keep up with the literature. But what you will become is sufficiently expert to be called to that somebody who’s looking for a deep neural, for a computer scientist to help their company, developing AI tools using deep neural networks, they’re going to hire you. Because if you’re done with this course and if you go through all the homeworks and the project, you really will know your stuff, not just in theory, but also be able to do things with your hands. I’m instructor, in case you don’t know. And this must be evident. I’m up here talking, right? My name is BigShaw. That’s my phone number. So x8, 9, 8, 26, x stands for 26. How many of you know that 268 stands for CMU on your phone? Come on, CMU kids, shame on you, right? We actually have our own area code, which is not area code with the local exchange. The digits are CMU. So my phone number is 412 CMU, 9826. We have a collection, we have a little zoo of TAs, 15 of them. Two of them are sitting up front, trying to actually one of them. Rajat, they’re used to be a TA. He is a volunteer today, but there’s one at the back. I’m not the man standing up. And Ryan can’t stand up because he’s streaming and stuff will collapse. Something bad will happen, right? And there are a collection of others, a large number of others who are not here who are supposed to be here. And hopefully most of them will be here in the next class. That’s good that they’re not here because you don’t have all found place to sit. So we have, for the course of being broadcast to several campuses, CMU has a campus in Kigali in Rwanda. And so the course is being broadcast. It’s being broadcast to the Silicon Valley campus. And these guys are the most unfortunate people on the planet. It’s 6 a.m. They’re right now. And then it’s also being broadcast to Doha. So we have TA’s on each of these campuses. If you need any help for anything, people in each campus must approach their local TA first before beginning to request help off the TA’s who are here in Pittsburgh. The majority of the TA’s are in Pittsburgh. The TA’s are not in such like going to be on the course pages. They’re not yet up. The course webpage itself is deeplearning.cs.cmu.edu. It’s very easy to remember. The C.S.C.N.U.edu and deeplearning is the subject. Now there is a lot of information on the logistics of the course. And instead of spending a great deal of time just gambling about the logistics, what I did was to record a separate lecture about the logistics, about the homeworks, about the quizzes, how things are going to be scored. Pretty much anything that I’ve talked about here when I’m introducing the course to you, including the actual syllables that I’m going to be covering during the course. So please see this video. But this is not just a request. In order to motivate you to actually see the video, we’re going to have a quiz with several questions about logistics. And you won’t be able to answer those questions if you don’t watch the video. But I will go over some of the logistics right here. We’re going to have in class and online section. So this is the in class section. It’s going to be streamed by a media tech from the next class. Now this is the last class in which the TAs will be streaming it. The videos are also going to be up on YouTube. We have our own channel. People who cannot watch the video cannot be in the class are expected to follow the streaming video if you have course conflicts or if like the folks in the Silicon Valley campus, it’s too early and it’s not reasonable to be expecting you to sit there brushing your teeth and watching the video. Then you can see the video on the YouTube channel. But you are required to watch the videos. Why? That is the point of this class. If you think it’s just going to go through the class and install the homeworks and get a grade, sure. But you pay for this course. And if you’re not watching the videos, then basically you’ve given us $2,000 for nothing. And if you have any bit of conscience about where your money goes, you must be watching the videos. But then we also like to motivate you. So we have quizzes and recitations. First we have 13 recitations in addition to the lectures, 26 lectures, which will cover implementation details, basic exercises. You had a very first recitation on how to use the Amazon Web Services earlier on Monday. So we have 12 more. The list of recitations is on the website. Again, these two will be streamed. I strongly recommend that you also watch the recitations. And why must watch the lectures? We have quizzes. We have four inquisers in total during the course. And each quiz is going to comprise 10 multiple choice questions, except the first quiz where we have 15 questions, because we also wanted to grill you on the logistics of the course. And the quizzes each week will relate to the topics covered that week. You will be given 24 hours to answer the quiz. The quiz must be answered online. And off the 14, we will be choosing the best 12 for every student, which means if you fall sick, if something happens, you can afford to lose up to two, which means that if you find yourself in a situation where you cannot submit a particular quiz, don’t approach me. There are a lot of false, say, no more than two times in the semester, after that you lose a grade. Blame your bacteria. Don’t come to me. Now, so again, which means you can, in general, I suggest you actually go through all 14 to give yourself the best chance at getting the best marks. The slides that I’m showing in class are not complete. So if you’re following the numbers on the slides, you will occasionally see the slide number jump from 24 to 36. That means there were 12 hidden slides in between that I’m not showing in class. Primarily, because if I went through everything that was on the slides, I would not finish the class on time. Also maybe those materials, I don’t think of those materials as being so critical as requiring presentation in class, but you are required to go over them, which means you download the slides from the website and go over them. And again, to motivate you. We are going to have questions in the quizzes. And the quizzes are going to kind of focus on the stuff I didn’t present in class. Half the questions will be stuff from the right in present in class. The other half will be stuff I did present in class. And very often, the quizzes will have questions relating to stuff I do talk about in class, but are not on the slides. So this is to make sure that you also have to watch the slides. And this week, we are planning on a really cunning quiz, quiz question, which is to say we are going to ask you the difference between the number of slides I’m actually showing in the class and the number of slides in the deck, which means you’re going to have to go through both of those, even if you’re not actually here in class following the video. You’re going to have to go through the video to find out how many slides I actually showed. And you can’t scheme through the video because then you’re going to count things wrong. Right? I’m very mean. So this is just to make sure that you follow stuff. Again, I’m being asked your parents. Your parents are paying for this. The course is not easy. It’s going to be a lot of hands on stuff. You’re going to be programming. The homeworks are killers. Very tough. Just in case you didn’t get it. It’s a lot of work. If you still didn’t get it, it’s a lot of work. And if you still didn’t get it, it’s a lot of work. Right? Not meant for chicken. Now, what does happen is that as the course progresses, people begin to chicken out. And it’s a waste of your time. It’s a waste of my time. And it’s generally not, it’s kind of pointless, right? Who wants to be chicken? So again, I must thank my students from the last edition of this course. We started off at 200 students and 175 stayed to the end. 178 were three of them to switch to pass away. So which means we had only about 10% of the students dropping out. And some of them had valid reasons. This is a characteristic of CMU students. If I have an easy course with easy grades and easy quizzes, I usually find half the students dropping out. And then when I up the ante, the tougher I make it, they begin beating on the door and beating each other up to make it into the course because, hey, you’re much, right? You’re okay, right? You’re okay, right? You’re okay, right? You’re okay, right? You’re okay, right? You’re okay, right? So, I’m going to go ahead with regards to the work we are giving you. I expect you to maintain CMU standards in the course. Stay with it and complete the work. But if you actually go through the logistic, I explain how we’re going to grade you. Our focus is not on deadlines, except that we do need deadlines because without deadlines, things wouldn’t end. You wouldn’t get your grades at the end of the semester. So we’ve sort of tried to design it so that we give you the maximum amount of time to do the work. Sometimes we passed what we nominally call the deadline. The idea is to have mastery based evaluation. We are evaluating you. We would like to evaluate you on how well you’ve learned the subjects of the course, not on whether you actually manage to submit stuff in time for the deadlines and such like. Deadlines are primarily motivators to make sure the course ends on time. That we get you, that we are in a position to give you feedback about your progress during the course in the course on time. If I let you submit all of your homeworks and all of your assignments on the last day of the course, you won’t know where you are, where you stand in the course till the very end. So that’s the only reason we actually have deadlines. But for the primary objective over here is to try to ensure that all of you gain some mastery on the subject. And anybody who gets an A in the course should technically be ready to go and do a deep learning job out in the industry. In fact, I would say that anybody who gets an A or a B should be good for the industry. And again, the students who took this course, pretty much all the students, masters and PhD students who took this course in the last edition, went out to internships and a great many of them outgassed at least three-fourths of them. And internships on deep learning jobs, where having gone through this course had a very direct impact on their being able to do the internships. So this is the last class in this room, which means if you spend a lot of time and effort trying to come here, please don’t use those brain cells for something else, because the next class is going to be in POSN151. From next week classes will be in GHC4307, which can hold up to 70 people. And media tech is actually going to be in charge of streaming as well from next week. So and media tech and panopter are probably the best set up we can possibly have for streaming. So we’re going to be good. I’m not going to pause for questions because we’ll run out of time. If you have questions, post them on Piazza. Hopefully everybody is on Piazza, right? If you’re not on Piazza, send us a note immediately you should get on Piazza. We have tried, I’m keeping on top of the enrollment and we’ve tried to make sure that everybody who is enrolled is on Piazza and AutoLab. We’re also trying to make sure that everybody who’s enrolled has access to their AWS coupons. All students will get up to three AWS coupons and use them wisely because one of the big problems with using AWS is that some of you log in and forget to log out to turn off your instance. And then you come to me at the end of the semester saying I have a $10,000 bill. What do I do? And I am not joking, right? Two, three, four, ten. These are the kinds of numbers I’ve seen from students who were too careless, too lazy, you know, got distracted. You don’t want to find yourself in that situation. This course is not that important that you have to sell your house for the homework. Okay? Well, it is, but you’d rather not, right? So before I begin, this is completely unrelated to everything we are going to cover. But Frank Rosenblatt was one of the big names who will encounter in a few minutes and he had this, you know, we are speaking of AI and AI started with ideas of cognition and perception. And Rosenblatt was one of the first people who began to work on real, you know, what I would call modern computational models of these phenomena. And here’s a very long description of what perception is. Perception then emerges as that relatively primitive, partly autonomous, institutionalized ratio, morphic, subsystem of cognition which achieves prompt and richly detailed orientation, habitually concerning the vitally relevant, mostly distal. It’s one sentence, right? And so the New Yorkers simplified it. They said, that’s a simplification. Perception is standing on the side, walk, watching the girls go by. This was in a time where it was allowed to say this and you wouldn’t go to jail if I said this, not what are you. Right? No, no, let’s move on, right? Questions? Well, you obviously got the lesson. You’ll be posting them on PRs and let’s continue. So and now we begin with the actual material that we are going to cover. These are the tasks that we just spoke of, right? We said neural networks have achieved the state of the art in speech recognition. A white signal goes in, a transcription comes out. An image goes in, a text caption comes out. A game state goes in, the next move comes out. But what happened between the input and the output was a box, a black box. What exactly is this black box? This is what we’re really going to be focusing on. And what we see, so is that all of these tasks are fundamentally human tasks, playing games. I mean, even if you ever do see a chimpanzee playing chess, it’s probably because a human has spent a lifetime training it. And it’s not really going to work, right? Or all of the other tasks, recognizing speech. We are the only species who have speech in the manner that I mean. There are many other species who produce some kind of communication, vocal communication. But we have very detailed and efficient language. And recognizing speech is a very human task. Assigning captions to images, describing stuff in language is a very human task. So all of these actions, those black boxes that we just saw, are powered by the human brain. So if we really want to understand how these things can be done computationally, maybe we want to start by trying to understand the human brain, or even earlier, try to understand what it means to think, what is cognition. Anybody know, somebody ever this statue, it’s very famous, right? Who hasn’t seen a picture of this before? You guys are not fond of raising hands, right? So nobody has seen a picture of this before. Or have you? Guys, engage. If you sit here and sleep, and then I respond and sit here and sleep, we’ll go through 26 lectures where this is just going to become a place to come and rest, right? So unless you want me to call you out and say, you know, Jing, have you seen this picture before? No, see? You don’t want that to happen to you, right? So respond, okay? Now does anybody know who this one represents? This was Augusta Rodan’s The Thinker. If you go to Paris, you can go to this place by the somewhere you can actually see this and several other statues. And this was supposed to be the next slide, Dante Allegheri. And who knows who Dante is? Oh, thank you. One more if you do, right? Is the guy who wrote say to loud, please. The divine in front of, right? Anyway, so again, that was just an aside. The point was it’s the act of thinking, the act of cognition. What are all the things humans can do? We can think, right? And think about thinking. This is a phenomenal thing. We think pointlessly. You’re having a shower. You’re not just sitting there and having a shower, your brain is running, you’re thinking about something, maybe you’re thinking about your latest assignment, or at least that has a point, very often you’re thinking about stuff that has no relevance to your immediate life and that’s where creativity comes in, right? Which means we can create, we can recognize patterns, we can solve problems, we can learn, all of these are fundamentally and deeply human actions, at least to the best of our knowledge, and these are the result of cognition, these are cognitive processes. So what exactly is this business of cognition? This is something that humans have been wondering about for hundreds, no, thousands of years, but then here’s the problem. Maureen Lensky’s quote, if the brain was simple enough to be understood, we would be too simple to understand it, it’s beautiful, right? That hasn’t stopped us, we keep trying, and so our attempts have gone back to about 400 BC, at least 400 BC. So what I mean by this is that there are recorded attempts at trying to describe how the human brain works, going back to Plato in 400 BC, and Plato came up with this theory called associationism. Anybody recognize this picture up here? Thank you, right? So those guys in the middle just to make it more complex for those. And there you go. Thank you, you are from H. Thank you. So now this business of associationism actually had quite a lot of following, and people worked on it for hundreds of years. David Hume was a thousands, over 2,400 years. 300 years. David Hume was from the 17th century, I believe, a British philosopher, maybe 18th, and who doesn’t, who hasn’t heard of E1 Pavlov? You haven’t heard of E1 Pavlov? Yeah, so he was a Walter. You recognize this famous experiment, or you’re sure, right? He trained the dog by providing it food every time, and bringing a bell every time he gave it food. And thereafter every time he rang a bell, the dog would begin to salivate. So what has the dog formed? An association between the bell and food. So E1 Pavlov was actually working on the theory of associationism. What exactly is associationism? Now, here’s an example, right? Lightning, really followed by thunder. So if you see a bolt of lightning, you’re going to expect thunder. On the other hand, if you hear thunder, even if you haven’t seen the lightning, you assume that lightning has struck someone somewhere close by, right? So you formed an association between lightning and thunder, and these guys came up with many rules of association, which you will see on the slides, and you’re going to get questioned about how one might form associations. And it turns out this notion, the whole idea that they had was that we sort of learned to think everything about how we actually learned to operate, how our brain works, has to do through these associations and inference is formed from these associations. So it’s associations over associations over associations. And it turns out it’s not really a bad idea. It actually kind of explains a lot of things that we see because, heck, if you look at machine learning, what are we doing? We’re just learning very complex association models, right? Here’s the input, here’s the output. I’m going to learn something in between that’s going to associate the two. So the notion of association is not really a bad idea. It’s a really beautiful idea. But just saying that I associate A with B gives me no insight into the problem. What I really need to know is how I form the associations and that is not explained by the mere idea of association. Trying to develop ideas on how associations may be formed or stored took a long time. And we began sort of coming up with hypotheses around the mid-1800s when people realized firstly by this step, people had already realized, you know, for hundreds, maybe you’re a thousand years, that all of these things happen in the brain, not in your knee or your heart or somewhere else. But these things happen in your brain. And by the mid-1800s, at which point we had very good microscopes, people actually realized that the brain is just a mass of interconnected neurons. And the way it is composed, we have thousands of, actually, we didn’t know the number back then, but we know now that it’s billions of neurons, tens of billions of neurons, that where each neuron connects to many other neurons and each neuron is connected to by many other neurons. Now, this by itself doesn’t tell you anything. It’s just a substratum. It tells you, tells you what the platform that actually performs cognition looks like. You still need a model for how the cognition is actually performed at the first real modern theory for this came up in 1873. This guy over here, Alex Bain, he was a philosopher, mathematician, a logician, a linguist, a professor. So back in those days, science hadn’t progressed very far. So most people basically were experts in every field. You know, I would have been surprised to know that he’s also a doctor, a brick worker, you know, an astronaut, right? That’s how people were. So then he came up with this really cool idea, where he said that all of the information stored in the brain is actually stored in the connections. And he even came up with these really cool ideas first that depending on the connections decide how the thing operates. I can have a single network with a fixed set of connections and based on how the connections are formed, the network can produce different outputs for different inputs. So here for example, if A and B fire, X is going to fire. If A and C fire, Z is going to fire. If B and C fire, Y is going to fire. So it’s the same network, but different combinations of inputs are going to produce different outputs. And even better, the level of the input can change the pattern of output. So here for example, if the input is weak, only this guy will fire because it receives three copies of it. But if it’s strong, even this guy will fire. So this idea that you can have a fixed structure designed, defined by the connections, which is going to provide different kinds of outputs for different inputs. It seems obvious to us now. You know, that’s how everything works. Back in the 1800s, this was anathema. He said there’s a people laughed at it. There’s no way. But this is actually the first modern artificial neural network model, you know, the proposal for artificial neural networks. So you know, everything that we are doing today, it’s not a modern invention. Artificial neural networks, as we know it, were proposed in some form way back in 1873. So this is going back what? 100 and 45 years. Pretty amazing. Bane also actually suggested how this network could learn. So he actually covered the whole base, how it stores information, and how it learns to store information. His whole idea was based on what we currently call heavy and learning. I want to read this. Let’s move on. Next slide. I think it’s all after being for several decades for a very simple reason. Now here’s a very nice code by Bertrand Russell. The fundamental cause of the trouble is that in the modern world, the stupid are coaxial and the intelligent are full of doubt. The classic Kruger-Dunning effect, right? And this was true of Bane as well. So he went through the math and he postulated that there must be 1 million neurons and 5 billion connections in order to obtain 200,000 acquisitions in the brain. And he said wait, 200,000 acquisitions is not meant to explain everything that we do. The number of different inferences that we make is much, much larger than that. And then we also have lots of partial acquisitions stored in the head. So he worked out the arithmetic and mind you, all this time people were scoffing at him. Nobody believed him. And eventually by 1903 he said, now I have a strong. He apologized to the world and died. Now of course, we know that he was wrong, not up here, but wrong in this. Because today we know that the human brain is really amazingly large. It has over 80 billion neurons and over a trillion connections. This ample capacity to perform pretty much everything that we perform these days. Obviously, we do, right? And the information indeed is in the connections. So his theory of connection is where information which says that the information lies in the connections lives on. Neurons connect to other neurons and the processing capacity of the brain is a function of the connections. So this is the connection as theory. And modern connectionist machines actually emulate the structure of the brain. So what is a connectionist machine? It’s a network of processing elements of this kind. And all world knowledge that is stored in the machine is stored in the connections. Now this computer that Ryan’s got or each of your standard computers or your smartphones or whatever else, what kind of architecture do they have? Anybody? How would you define that architecture? By the way, those of you with open laptops, please shut them. This is a classroom where laptops must not be set kept open, right? You’re a TA, you’re allowed, right? He has to follow the streaming. So how do you define the architecture of the modern computer? There’s a name for it, right? It’s, pardon me? It’s a one-human architecture. What is a one-human architecture? There’s a processor memory at the high-end ones. There’s a processor, there’s a memory, and there’s an IO device, right? This is what the one-human or the Howard architecture looks like. You have the processor and the memory is separate. The memory stores the programs and the data. This is what enables a single machine of that kind to perform millions of different actions. You just change the program. It does something else. You change the data. It does something else. That makes it really, really versatile. On the other hand, which is why we even use it to emulate connection as machines these days. On the other hand, connection as machine is very different. A connection as machine, in a connection as machine, the program is the architecture. The connections specify the program. If you want to change the program, you have to change the machine. You have to change it because you have to rewire your entire machine. Which is why you don’t actually go off and build a hardware for your neural network every time you do it. You actually emulate this guy on this guy, right? Simply because as a machine by itself, a connection as machine is a fundamentally different principle than your standard machines. So go ahead, right? So here’s a quick recap of everything we’ve seen so far. Neural network based AI has taken over most AI tasks. Now, and of course, these things began originally as computational models of the brain or more generally models of cognition. The earliest model of the cognition was associationism. The more recent one is connectionist. It says neurons are connected, neurons in the workings of the brain encoded in the connections. Current neural network models are all connectionist machines. A pause for a couple of seconds, any questions? No? So here we are again. This is the connectionist machine. You have a bunch of units. They are connected to one another in different ways. And all information about how it operates is actually stored in the connections. But then the units are also important, right? What are these individual elements? Now the individual elements, by the way, so there’s some, one of the earliest people to have proposed something similar to what we currently call connectionist machines. The individual networks was with a complete mechanism for how to make these compute different kinds of functions and how to learn these functions was allenturing. So if you go through the slides, you will see a couple of slides about it and you’re going to have a quiz question about it. Allenturing did everything. But anyway, moving on. So what are these units in the brain? In the brain, if you go back. Next one. Yeah, moving the wrong direction. Yeah. So next one, please. Yes? Okay. Oh, really? Okay. So in that case, oh, fantastic. My clicker’s working. Lovely. So I’m free of having to gesture to you. So here are the units in the brain. The individual units in the brain are neurons. And here’s what a neuron looks like. It’s got this main head with something called a swimmer, which is the nucleus. And all of these dendrites through which other neurons communicate to the neuron. Now when the total signal from all of these dendrites exceeds a threshold, then this neuron fires. That signal travels down this long leg and is communicated to other neurons. So this long one is called the axon. It’s covered by something called the myelin sheep, which are formed by grilled cells. And it’s mostly fat. So this is something I like to inform everybody. How intelligent you are is not decided so much by the number of neurons in your head as by the amount of fat in your head. So being called a fat head is like a great compliment. Right? Don’t forget that. In fact, people have analyzed Einstein’s brain. And everybody’s interested in how is Einstein’s brain different from mine. It turns out he has more glial cells than normal people. He has much more fat in his head than you and I or had. So you really want to be a fat head. Also adult neurons don’t undergo cell division. Now this is another crazy bit of information. You don’t get smarter as you grow older because you get more neurons. You get smarter as you grow older because your neurons die. That should scare you. Anyway. So that’s the biological neuron. But if you want to perform computations with it, you need a computational model for it. And the first computational model was formed by these two guys. One of them was Warren McClub and the other is Walter Pitts. Warren McClub was a professor in the University of Chicago. Walter Pitts was a hobo who ended up at his door and who is who? Anybody want to guess? So this guy is McClub. And this is Walter Pitts. Walter Pitts was 15 years old or something. He ran away from home and never went back home. He used to exchange mail with Bertrand Russell. And one day at the ripe old age of 19, he ended up at Warren McClub’s door. McClub took him in and they worked on the perceptron. So this was called the first mathematical model of the neuron. It wasn’t called a perceptron back then. It’s described in this lovely paper by McClub and Pitts. And Pitts was only 20 years old when he wrote the paper. Now almost 80 years later, I still can’t understand the paper. A few people can because he invented his own Mac to make the descriptions. And it’s kind of dense. But here’s the basic idea of how, basic model of how they actually computationally characterize each individual neuron. You have a unit which is going to fire if it actually gets two or more inputs. It can get inputs from many different connections. They can have two kinds of connections. They are the, they are, they are, can you go to the next slide. They are the excitatory synapses. These are the synapses which actually excite the neuron. And if you have sufficient excitation, it’s going to fire. You also have the inhibitory synapses. If some signal comes down this guy, this guy is not going to fire regardless of what comes in from here. This is the McClub and Pitts model. And they showed that this model, oh, yeah, I have the click. I forget, right? And so that this model can actually perform all kinds of Boolean operations. So here, for example, every time one fires a little later, two fires because it’s expecting two inputs if one fires two gets two inputs. So this is just a delay. Here if either one or two fires, this is going to get two inputs. So this is an OR. Here both of these must fire. And only then will this fire. So this is an AND. Here if one fires, three will fire, but only if two is not active. It’s one and not two. So you can form all kinds of Boolean operations, which means you can actually perform some fairly complicated Boolean arithmetic. You can perform pretty much any kind of Boolean arithmetic with networks of these basic units. This was one of the landmark proposals in the history of neural networks. But when Mechalo and Pits first made this proposal, they actually kind of went overboard. And they claimed that their nets should be able to compute a small, you know, a pretty large class of function. They said they could be equivalent to Turing machines if they had a tape. And they even said they were Turing complete. But they just made these claims and never really actually proved anything and these claims are all wrong because it’s a finite state machine and you can’t be Turing complete if it’s just finite state, right? But they also didn’t provide a mechanism whereby the network could learn how to perform its operations. The first real proposal on how networks, the network could learn came from Donald Hab in this paper in this book called Organization of Behavior where he basically said that if I have two neurons which are connected, every time the two fire together, the connection is going to get stronger. So you have, for example, this neuron connected to this guy through this den right. And now this is coming out from the axon of one neuron. Every time this guy fires and causes this to fire successfully, this little head he said is going to get bigger. So neurons that fire together, wire together, that was the famous statement that he made. And he actually came up with a mathematical model for this. So let’s say W is the strength of the connection between these two guys. Then anytime these two fire together, x and y are both going to be one. And so this weight is going to increase a little because eta is always positive. Now what is the problem with this kind of learning mechanism? Anyone? Yeah, there’s no reduction. If you wait long enough, every connection is going to be saturated. So this is really, this doesn’t really explain the real world. It’s fundamentally unstable and learning is unbounded. People came up with all kinds of corrections like generalized Hebbian learning, also called Sanga’s rule, where Sanga came up with the idea that you can try to modify Hebbian learning to explain multiple outputs at the same time. There were other corrections. Hebbian learning is used all over the place, but then you have to make this fundamental modification that you’re also allowed to decrease weights as opposed to increase weights, which leads us to this guy, Frank Rosenblatt. Frank Rosenblatt is the fellow who basically killed research on neural networks for about 10 years by basically building the first model of the modern perceptron. Now you are likely to ask me why is it that building a model of the perceptron is going to kill research in the area and he has why. So he came up with this, his model was this one, which we’re all very familiar with now. The idea was that if you have a bunch of inputs into a unit, the way the unit operates, it looks at a weighted combination of all of the inputs, the weighted combination exceeds a threshold that’s going to fire, otherwise it does not. So you have a number of inputs which combine linearly and the unit fires if the combination exceeds a threshold, otherwise it doesn’t. And as he showed, this is actually this very simple structure is extremely versatile. It can perform all kinds of operations. In fact, this little unit, if you wanted to replace this little unit by a Boolean function or a network of standard binary Boolean elements or even MRE Boolean elements, that network would have to be exponentially large in the number of inputs, something that could be done by this just this one unit and we’ll see why in the next class. So he was hyper excited, but even more excited than he, where all the newspapers of the time, which said things like the embryo of an electronic computer that the Navy expects will be able to walk, talk, see, right, to produce itself and be conscious of its existence, New York Times, 1958, right. Frankenstein monster designed by the Navy, that thing, right, tools, tools are Oklahoma times, 1958. Now, obviously, they’re going overboard because this was the machine. Nothing more than this, this was the machine that was going to walk, talk and reproduce itself. And, Rosenblatt also provided a learning algorithm for this. We’re going to see this again in a couple of classes, where the connection between an input and an output would have some weight. And the learning process was that if the, our current output of the unit did not match the desired output of the unit, then the difference between the two multiplied by the current input was added to the current weight. And this learning rule is very popular. We use this all over the place everywhere in machine learning. The first person to define it, actually one of the first people was Frank Rosenblatt. And he proved that using this learning algorithm, his little unit could learn, you know, there was a convergent rule, which that allowed his, this little perceptron to learn all kinds of things. The problem, of course, is that it’s not a machine that’s ever going to walk, talk and learn to reproduce itself. And here are the things it can do. A perceptron can be a Boolean function. Here are now in these figures, remember you’re looking at a weighted combination of inputs and comparing it to a threshold. So in these figures, I have inputs. Each of these is a perceptron. The weights are written about the arrows. The threshold is inside the circle. So look, this figure here is, has a threshold of two. It’s the only way this fires is if both x and y are active. This one has a threshold of one. It will fire if either x or y are active. Because either of them is active, then you get an in total input of one, which matches the threshold. This one is a negation. It has a threshold of zero. If x is one, what comes in as minus one, it will not fire. If x is zero, what comes in as zero, it does match the threshold and it’s going to fire. So you can perform all kinds of Boolean operations. What you can’t perform is an xr. This was what Minskian Pappard showed in 1968. Now, you’re all familiar with what an xr function is, right? How many if you don’t know what an xr is? Thank God. So this was in 1968. And when people realize that this tiny little problem couldn’t be solved by a perceptron, the DARPA or whoever was funding Yale for all of this research, the Navy, I think. We drew their funding and refused to fund Yale for a very long time on this topic. And research on the problem basically died for a while. Because people figured that this is not really going to solve the world’s problems. But then somewhere in their work, Minskian Pappard actually made a second claim that people kind of didn’t quite understand the importance of. Individual elements of weak computational elements, but if you network them together, they can get fairly powerful, right? So let’s look at the xr again. If I network three of these perceptrons together, I can create an xr. And now here I have these inputs, this guy is a simple perceptron which computes x or y. This one computes not x or not y. Then I have a final perceptron which combines these two and the two and I get an xr. Now the thing is, these guys in the middle are what I’m calling the hidden layer. Why are they hidden? Their outputs are not really going to be seen. What you’re only seeing is the final output. And if you don’t really care what the hidden values are, you’re only interested in the final output. So this is the hidden layer. And using a network of the sky, now you can form a Boolean function. You can form an xr. But then once you can form an xr, people realize that you can begin, if you begin connecting these things, you can compose any arbitrary Boolean function. Here’s a really ugly function. It’s a Boolean function of four variables. And I don’t even know what it is. I just made it up on the fly when I was making these slides. What is clear is that I can draw a little new, you know, a network of perceptrons which computes it. Now, again, I’ve been calling this a network of perceptrons. This is your standard neural network as you know it. This is what we will call a multi-layer perceptron. In that, you have taken many perceptrons, connected them up in layers. And if I have a final output and I have a multi-layer perceptron, we’ll talk about more of this, more in the next class. So the story so far, neural networks began as computational models of the brain. The connection as machines, they’re also Boolean threshold units. And MacClaun fits showed that they are actually Boolean threshold units, but they didn’t give us a learning rule. Heb came up with a learning rule that was unstable. Rosenblatt came up with a variant of the MacClaun fits neuron for which he actually gave a convergent learning rule. But then he overstated what it could do. But we discovered later that multi-layer perceptrons can model arbitrarily complex Boolean functions. Sorry? So feel free to stop me at any time. If you have questions, all of those things describe Boolean machines. Your brain is not Boolean. Your brain is working on real valued inputs. These are not in a sequence of bits. They’re continuous valued inputs, right? We do make Boolean predictions or inferences. Is this a clicker or is this not a clicker? That’s a binary Boolean answer. But the input that comes in is this entire image. It’s me holding this thing and asking you is this a clicker, right? So the input is very complex and real valued. Well guess what? The perceptron actually works on real valued inputs. If I just restate the same thing to say that x’s are real valued, the whole thing still works. I have a weighted combination of inputs, which is compared to a threshold. If it exceeds the threshold, it’s going to fire. Otherwise not. And I can now once I begin looking at it like so when we speak of perceptrons in the modern literature, the modern world, we are not going to simply use a hard threshold. Sometimes we will use a soft threshold. We’re going from, instead of going directly from 0 to 1, you’ve got to sort of slide from 0 to 1. This is a sigmoid, which is like a smoother version of a threshold. Or you can have more generic functions that operate on the weighted combination of inputs and the threshold. Just to, the only reason I’m mentioning these is that I don’t want you to think that I’m going to focus entirely on threshold based units. We’re just using threshold units to provide some intuitions and waters. What is really going on, right? But anyway, let’s go back to our threshold unit and look at the kind of function that it actually computes. You’re looking at a weighted combination of inputs. If it exceeds the threshold, you get output of 1. Otherwise, the output is 0. Now there is a point, there is a, there are a set of x’s where the threshold is exactly met. And that is the equation of w, w summation, w i x i equals t. What’s that? That is the equation for a hyperplane. In two dimensions, that would be the equation of a line. So that means it would be something like this line. It says that this criterion, condition is exactly met on this line. The output is going to be 1 on one side and 0 on the other side. If I want, if for two dimensional inputs, if I were to plot the function, the function is going to look like this. It’s going to be 0, right until the sloy. And then when you cross the line, the output is going to be 1. So it’s a step function, angle somehow, right? In a generic kind of dimensional space, on one side of the space, it’s going to be 0. And as soon as you cross a hyperplane given by that criterion, the output is going to become 1. And now you can see why this perceptron can be a Boolean function. So in a Boolean world, inputs are going to be either 0 or 1. If I had only two inputs, input combinations are going to be either 0, 0, 1, 1 or 0, 1. And now I can draw a perceptron of that kind over here. And that perceptron is going to output a 1 for these three combinations and a 0 outside. What is this perceptron? What gate is it? Anyone? That’s an R, right? If any of the inputs is 1, it’s a, the output is a 1. What about this guy? That’s an R, what about this guy? That’s a not Y, right? It’s inverting Y and ignoring X. So you can see how just having this simple linear threshold function can perform Boolean arithmetic. And it also tells you that you can come design an infinity of different perceptrons, all of which would be an R or an R. Not any given Boolean operation, the number of possibilities of perceptron rules. You don’t need just weights of 1, the specific examples I showed you, there are other things that would also work. But then once you design this, kind of once you realize what’s going on, you can build arbitrarily complex functions. So look at this guy. I want a function that’s working on two-dimensional input. But the output is 0 if the input is inside the Pentagon and 0 outside. I’m allowed to use perceptrons to compose this function, to compose a function that has this output. How would I do it? Anybody? Speak up. Thank you. So I can have one perceptron, which does this. Another one, which does that, a third, which does that, a fourth, which does that, a fifth, which does that, and a sixth, which ends the outputs. Right? And voila. I have a network, which actually produces this, exactly this kind of output. But this should tell you more. Now I can do something like this. And how would I do it? It’s on the slides. So you don’t have to actually exercise your imagination. I have one sub network, which computes one, and the other one computes the other polygon, polygon, or the two. And now I have a somewhat more complex decision boundary. How would I build these guys? Anyone? How can I compose a network, which has these totally ridiculous looking decision boundaries? Somebody speak up, Venkard. Oh, I mean, you just spend the art capture, which already does, like, any more more data libraries? That’s no. So I want a more specific answer. Yes? I can just break these decompose these figures into many complex, many convex to a union of convex shapes. I can have one sub network for every convex subcomponent, and then, or the lot. And then I have this, right? I can build this guy. So we can build arbitrarily complex shapes. So when I’m performing classification, what am I really doing? If I have to look at something like this and say, is this a two or not, that’s a 764 dimensional input. In 764 dimensional space, I can assume that all the two is live within some region, and everything that’s not two lies outside the region. And all I really have to do is to build a function that learns to model this guy. Right? So continuing our story, we’ve seen that MLPs, multilayer perceptrons, are connectionist computational models. They can model Boolean functions, but they’re actually Boolean machines. They represent Boolean functions over linear boundaries, and they can represent arbitrary decision boundaries. They can be used to classify data. But you can do more. You can model real valued functions. So let’s say you have continuous valued output, something like this, where again, I’m working on two dimensional input in this case. And I want an output, which is not just 0 or 1, but continuous valued. Can I do this? And we will see more of how I can do this for any kind of function in the next class. But I’ll give you a heads up on how this would actually happen. So let’s look at functions of a single variable. I want an arbitrarily complex, arbitrary function of a single variable. Before I do that, I’m going to start by building a simple component, which is this little network, which takes two inputs, which takes a single input. The input is fed to two perceptrons. The first perceptron fires if the input exceeds threshold T1. The second one, second one fires if it exceeds T2. And then their outputs are combined with weights 1 and minus 1 respectively. So what happens? As the input scans left to right, when it first exceeds T1, the first one is going to fire, the second one will not. So the output goes to 1. And then when I exceed T2, the second one also fires and cancels out the first guy. And the output goes back to 0. So between T1 and T2, I can have an output of 1 elsewhere. I can have an output of 0. And now I can model any function. I can be composed of this function approximated as a sequence of step functions. And I can have one pair for each of these guys. This figure is wrong, but actually not really. So I can have one pair of each of these guys. And each pair is going to be scaled by the height of the function within that region. So as I go left to right, I’m always going to get some an output which kind of approximates the function that I really want. I can make the approximation arbitrarily precise by making these narrower and having more and more neurons. So what we really see is that in addition to being connectionless computational models that can perform classification, MLPs also model continuous valued functions. Now through these slides, I’m going to keep retaining this pattern of summarizing everything that we’ve done so far using the story so far. So questions, if you have any questions, these are the slides where I’m going to pause. And here’s where you ask questions. Questions? Yeah. Neural networks. Everything I’m talking about in this course is a neural network, yes. So there’s one instance of neural networks here. In fact, we’re going to build everything on top of multi-layer perceptron. The basic definition of a neural network in the first place was what we call a multi-layer perceptron. Yes? Yes? Yes, you certainly can. And so the point is, we’re looking at the simplest possible units. The more complex you make the units, the harder it is to get any kind of interpretation of how these things behave. But entire focus is on building up these architectures of trivially simple units. And why are these things so immensely popular and powerful? If you focus on an individual unit, the individual units are amazingly simple. All they are doing is looking at a weighted combination of inputs. And they’re putting it through a non-linear function which can actually be a threshold. If all I do is to just look at the weighted combination of inputs and apply a threshold to it, I can model any function in the universe. That’s what we’ve seen. So the simplicity of it is really powerful. Anyway, so here are all the other things MLPs can do. They can model memories. So for everything that we’ve seen, this kind of feed forward, input comes in, gets process, then that gets process, gets processed, and then gets sparse, sparse. It becomes an output. But you can actually have loopy networks where the output of a computation can go back eventually after some further processing to the same unit. And Lorentz in 1930 proposed this as a model for how memories stored in the central nervous system. Now we will in many, many moons from now, at least two moons, right? Two, four moons from now. Three, four moons from now. We’re actually going to look at these guys. How I can use the same simple architecture and compose really fancy models of memory. And one of the things you’ll note about how your memory operates is that you don’t just carry everything out iconically in your head. You get reminded of stuff from partial observation. Nothing more evocative, for instance, than a smell. You’re going somewhere. You smell something. You remember something completely relevant from your childhood because you made an association between that smell and whatever it is you remember, right? How does that happen? So it turns out that that’s memory. And those things can be explained by loopy networks. You can compute neural networks in addition to the kinds of functions that we saw. They can compute probability distributions of all kinds of domains, including complex value domains. They can compute distributions of data. They can also represent a posterior compute, a posterior probability meaning I have some belief about the world. Then I see some data, I change my belief and something else comes out. All of that can be modeled by neural networks, MLPs. So yes, that was from Aishwast. And over the two, so what else? It doesn’t matter. And here, here’s something else they can do, I think. But anyway, that’s just me being facetious. So when I’m speaking of neural networks in AI, the network is just a function given an input. It computes the function layer wise to predict the output. So what are all of these boxes? These boxes are just neural networks. In all of these tasks that we just saw, the little structures that I just showed you, those are the structures that actually go into these boxes. And it’s phenomenal how something so simple, which is consists of some really trivial simple processing elements connected in a network. And it can perform amazing tasks like recognize speech as well as human beings or caption text or play chess or learn to be the world champion at go. So in closing, interesting AI tasks are functions that can be modeled by neural networks. So that’s the final slide. Now in the next lecture, which is going to be on Friday. So this is the only week where we have a class on Friday. And that’s because I came in late from India. So we swapped the recitation in the class. And the recitation happened on Monday and this class. And the second class is going to be on Friday. So we are going to cover more on how neural networks are universal approximators. Why we believe they can model pretty much any function in the universe. And we’ve been speaking of deep neural networks. What is this business of depth? And why do we care? And how should we think about it? We will also look at that problem. So the next class Friday is in Pazna Hall, 151. Where recitations are supposed to be held. So we have to wait for the next session. All right. So nobody wants to have questions. We have questions because everybody is in a hurry to leave. Yes.

AI video(s) you might be interested in …