Provably Beneficial AI – Stuart Russell | CogX 2019

Hello everyone. Good morning everybody. So today I’m going to fill Phil or promise I made yesterday to talk about a different way of thinking about AI that leads to what we call provably beneficial AI systems. So let me take you back in time to the first edition of the textbook on AI. And back then we had a section in the last chapter called what if we do succeed. And this question was actually prompted as this little section from the chapter shows by a little vignette in a book by David Lodge, who some of you will have read his books, Small World Changing Places about academics. And by a coincidence David Lodge bought our house in Birmingham and his books are about academics going from Birmingham to Berkeley. And I had just got a faculty position at Berkeley so I was going from Birmingham to Berkeley. So I read one of his books and in one scene a graduate student goes to a conference, this is literary theory and asks the assembled grandees on the stage what if everyone agreed with you. And clearly they had never thought about this question before. What would be the consequence? What would follow if they were actually right? And you could ask people in AI the same question. What would happen if we actually succeeded in our goals as a field? Right, now if you asked a cancer researcher what would happen if you succeeded? They would say well then we could clear cancer, be very straightforward. But people in AI really had never thought about it. And if you asked Alan Turing, well he did think about it and he said we would lose control. Very straightforward answer. So this question has re-emerged in the last few years, people like Elon Musk, Stephen Hawking, Bill Gates, talking about the possibility that we might lose control to machines that are more intelligent than ourselves. And just for fun I imagined an analogous situation that might arise if there was a super intelligent alien civilization that sent us an email saying okay well in about 30 to 50 years we are going to arrive on Earth. And if you look at the response from humanity to the prospect that super intelligent AI would arrive on the Earth in 30 to 50 years, which actually most AI researchers believe. So this is not a far out view of people who know nothing about AI, this is not people with tin foil hats. This is the majority of professional AI researchers believe that within 30 to 50 years we will have super intelligent AI. And what are we doing about that? Right? Basically nothing. So it would be as if we replied to the aliens sorry we’re out of the office, we might even send them a little smiley face to go with it. But we are really doing nothing. Now I think if the aliens said we are arriving in 30 to 50 years we would be having a complete global panic attack and doing everything we could to prepare for the alien arrival and figure out what they were like and so on and so forth. But with the arrival of super intelligent AI we are just assuming that it is no different from the arrival of the next iPhone or something like that right, no longer much of a big deal. And it is a big deal and we have to figure out what happens if we succeed. And when you think about it if we succeed in the standard model of AI which basically works like this, there’s a human being who has to specify the objective, here I’m showing specifying the objective as some of discounted rewards but it could be a goal, could be a utility function, cost function. And then the machine is simply going to carry out whatever objective the human supplies. This is the standard model for AI. And the reason why succeeding in this model would be a bad idea is that we cannot specify the objective correctly. So this is not a new point, this is there in the legend of King Midas, it’s there in the sorcerer’s apprentice. If you ask for the wrong thing and you get the wrong thing then you are going to be extremely unhappy in King Midas’s case. He got what he asked for, everything I touched turns to gold including food, water, family and then he dies in misery and starvation. So how do we deal with the fact that we are pursuing a standard model whose logical conclusion with super intelligent AI could be arbitrarily catastrophic for the human race? That should make you pause. If it doesn’t make you pause there must be something wrong, right? Because most of you are working in this field which has this as its logical conclusion. Let’s actually look at something that’s happening even now optimizing the wrong objective. In this case optimizing click through. So we know that this is what many of the online algorithms that do content selection in social media on Twitter, on Facebook and so on, recommendation systems, they are all trying to optimize an objective of getting you to click on things. So I think the idea was the following, right? Suppose you have, let’s just think about your political interests, just to pick a random example. And we’ll put your political interests on a scale from left to right where we have neo-fascist on the right and then we have Berkeley on the left. And let’s pose you’re a summer in the middle. Actually imagine this is your mom or your dad, you know, they’re nice people. They don’t have strong political opinions. Sometimes they’ll read an article from the Guardian, sometimes they’ll read an article from the telegraph, but you know they’re typically not subscribing to the Daily Stormer, which is a popular newspaper in the United States these days. They’re actually very middle of the road. And the fact that they have broad somewhat weak preferences means that it’s not that easy to send them stuff that they’re guaranteed to click on. So the algorithms basically try out various articles. And the red ones are articles that the person doesn’t click on. So they try a very left-wing article, didn’t click on that, try a very right-wing article, didn’t click on that, try one in the middle. Oh, that worked, they clicked on that. Try a few more, that one didn’t work, that one worked, that didn’t work, that one worked. Good, okay. So now we’re gradually getting a sense of what kind of person this is. So I think this is what the Facebook engineers had in mind, that the system would learn what you like and what you don’t like. And then they could send you things that you like. So that would be the idea. Now if you were an extreme right-winger, then all these articles, the ones on the left would all fail, and the ones on the extreme white would succeed. And then again, the algorithm works around the spectrum trying to find out where you are and gradually discovers that you’re an extreme right-winger. And notice that at the extreme right, you have a narrower variance, right, which means that there’s a higher probability that you’re going to click on articles that are sort of at your centroid. And this is the only assumption we need that at the extremes you have a lower variance in your interests. Okay. Now this is the picture I think that the Facebook engineers had in their head. But what actually happened was this, that you take a person who’s in the middle and the algorithm learns to send a whole bunch of stories at a little bit to one side or the other. And as you read those stories, your actual preferences, your political views and opinions start to change. And gradually you move over here until you become a neo-fascist. So now your mom or your dad is a neo-fascist. And this happens to millions of people. Their kids go home for Thanksgiving as they do in the United States in November and find that their mom and dad have turned into neo-fascists. And this must be rather disconcerting. So what happens? Well, this is the result of using a simple reinforcement learning algorithm that is trying to optimize click-through. That’s it, right? So 50 lines of code have broken up the European Union, destroyed NATO, and possibly eliminated Western democracy. That’s an algorithm from chapter 21 of the first edition of the textbook. And imagine if that was actually a really intelligent or even a super intelligent algorithm, that knew much more about human psychology and the human mind, how much more effective would it be at achieving this kind of objective? So this is just a little warning to us. But when we turn over things to systems that optimize objectives and that’s the wrong objective, we can have catastrophic effects. So how do we actually get into this mess? We’re now pursuing a scientific discipline whose endpoint is our own destruction. Well, if we go back to the beginning of the field, we wanted to figure out how to make machines intelligent. In order to do that, we basically said, well, what does it mean for a human to be intelligent? A human is intelligent. If their actions can be expected to achieve their objectives. This is the definition of rationality. You could argue that it goes back to Aristotle. And in AI, initially it was pursuit of objective stated as goals. Then it became rewards, utilities, as parability theory was added with machine learning, it’s loss functions. And then we said, OK, let’s take that definition and apply it to machines. So now machines are intelligent if their actions achieve their objectives. That seems perfectly reasonable. We just took the definition of intelligence for humans and we transferred it directly to machines. So this is not just AI, actually. This is other disciplines that really are sort of a pillar of 20th century technology. The control theory, economic statistics, operations research. They all operate on the same model. And this model is wrong. This is a mistake. It only works when you have the possibility of detecting that you put the wrong objective in and resetting the machine when you can test things in simple cases where the scope of action of your system is limited so that the downside of a mistake is very restrictive. But as I just illustrated in the case of social media, our AI systems now have global impact. And so when you put in the wrong objective, you have global negative consequences. So this model has worked only because our systems are too stupid and too limited to have serious downside. And we no longer in that period. So we need a better model for how to build intelligent systems. So what I’m proposing is that the way we think about what we want to build an AI is not intelligent machines in the same mold as human intelligence, but actually intelligent systems whose only objective is to achieve our objectives, not their objectives, not objectives that we put into the machine, but objectives that remain within us. And if this is actually what we can do, if the machines are indeed pursuing our objectives, the ones we truly hold our true preferences about the future, then those machines are necessarily beneficial to us. Now of course the difficulty is that if the objectives are in us and not in the machines, then how are the machines supposed to figure out what to do? And so that’s why I’ll explain in the next few slides. So here’s how it works. First of all, then we design robots to pursue an objective which is the realization of human preferences. And when I say preferences, I don’t mean preferences over different kinds of pizza, I mean preferences over entire futures. So everything that you could possibly care about for as long as you could possibly care about it, that’s what I mean by preferences. And the key point is that the robots do not know what those preferences are. And we will see that it’s this uncertainty about preferences that actually guarantees that the robots remain deferential to human beings indefinitely. And this is a key point that I think is somewhat unexpected, that uncertainty about objectives implies safety and deferential behavior. There has to be a connection between the humans and the machines. There has to be some way for the machines to eventually gain evidence, gain information about human preferences so that they can be useful to us. And the way they do that is by observing human behavior. So everything that people do, including sitting here and listening to me, is providing evidence about your preferences. Everything the human race has ever written is about people doing things and other people being upset about it and all of this provides evidence about human preferences. So there’s a vast amount of evidence that we have and that machines can access about what humans do, what makes humans happy, unhappy, and what we want the future to be like. When we take these principles and turn them into a mathematical framework, it’s called an assistance game. So it’s a game in the sense of game theory because there are necessarily at least two agents involved, both the machine and the human being. So it’s a game theory formulation. It’s an assistance game because the machine is designated, constituted to be of assistance to the human being in achieving the same objective. Now if you think back to the way AI was defined classically in the previous editions of the textbook, for example, and you describe that as a graphical model. So in the graphical model, we have random variables and you may have an enra-horribleistic dependencies between the random variables and we may have observations of some of those random variables. So here, this diagram is showing that in the classical view of things, the human objective is observed by the machine and the machine’s behavior depends on what the human objective is and we assume that the human objective is perfectly known. Now in this graphical model, if you know anything about graphical models, you can simply remove the human behavior because now the objective is a sufficient statistic for the machine to decide what to do, which means the human can be jumping up and down saying, stop, stop, you’re going to destroy the world and the machine says, well, I already know the objective, so I already know that what I’m doing is perfect and so whatever you’re saying is just hot air, I’m not going to pay any attention to it. And this is the classical view and this is what I’m arguing is a mistake. Now when the objective is not observed, then if you know anything about graphical models, you know that the human behavior and machine behavior remain coupled to each other. And this coupling is what’s going to keep us safe and so let’s try to understand a bit more about how this coupling works. So first of all, let me give you some concrete examples, image classification, right? This is one of the standard methods of, sorry, it’s one of the standard goals of AI going back to the 1950s, take an image, decide what are the objects in it, what categories do those objects belong to? And so how is that formulated as a machine learning problem? We define a loss function and we minimize the loss function on the training set. So we adjust all the weights of the deep learning network or whatever other kind of vision algorithm you have and then by that process, we can optimize the loss on the training set and we hope that that gives you good predictions on the test set. Now usually and certainly in all of the computer vision competitions, the loss function is expressed as a matrix that’s uniform, meaning that the cost of misclassifying a dog as a cat is the same as misclassifying an apple as an orange or an apple as a cat or a dog as an apple or whatever. And the problem is that it’s also assumed that it’s the same cost for misclassifying a human as a gorilla. Now what happened when, so this is actually something that Google Photo did. Someone uploaded their photos to Google Photo and Google Photo classified them and their girlfriend as gorillas. And then that person tweeted an image of this screen to the world and it cost Google hundreds of millions of dollars in sort of emergency public relations damage control and loss of goodwill and problems with their own employees and so on and so forth. It was a big catastrophe. So what should we do instead? Well I suppose they should have thought, okay what is actually our loss function? I’m willing to bet that they just went ahead with a uniform loss matrix because usually in machine learning we actually don’t worry too much about what the loss function really is. But the loss function is very large. There are thousands of categories of objects and so the loss matrix would be hundreds of millions of entries. And so if they had actually thought what is our loss function they would then have realized well it’s vast and we don’t know what it is. And if you don’t know what your loss function is then you need to behave in a much more robust way. You should say for example perhaps it’s too dangerous to classify this image because there’s a high tail parability that this has a very expensive entry in the loss matrix and therefore I shouldn’t classify it. And in fact since this is our, so this is now what happens. Any time you show Google photo and image of a gorilla it says I’m not quite sure what this is. So that’s I think that’s probably a manually inserted piece of code. So this is just one example right that when you start to think about the fact that you don’t always know what the objective is you actually end up having to design completely new algorithms, completely new workflows for machine learning and those workflows will give you systems that are much more robust that don’t misbehave and that sort of ask permission before they do anything too dangerous. Let’s take another example. Suppose you just asked the robot to fetch the coffee. What does that mean? Does it mean okay fetching the coffee now is my life’s mission. This is my sole purpose in life is to fetch the coffee and anything I do in the pursuit of fetching the coffee is perfectly acceptable including killing everybody else in Starbucks so I can get to the front of the line first and so on and so forth. The answer is no that’s not what it means. What does it really mean? What it really means is that you know all other things being equal I’d rather have coffee than not have coffee. But it doesn’t tell you anything about all the rest of your preferences. It doesn’t tell you whether you’re allowed to kill people in Starbucks. It doesn’t tell you whether you’re allowed to pay you know 36th quid for a cup of coffee if you’re at some fancy hotel. It doesn’t tell you whether you can you know spend three weeks trekking across the desert to find the nearest cup of coffee. If the person happens to say I really could do with a cup of coffee in the middle of the desert. So it’s a very weak piece of information for the robot to work with. So how does the robot do anything at all when it’s got so much uncertainty about the rest of your preferences. Well interestingly as long as you don’t change the rest of the world you can still be useful to the human right. If you can fetch the coffee without messing up anything else about which you’re uncertain in terms of its value to the human then it’s still okay for you to fetch the coffee. So this means that even in the presence of enormous uncertainty about human preferences you can be useful by acting in a sort of a minimally invasive way. And so you naturally get robot behavior that’s very cautious rather than sort of slaughtering people in Starbucks. So let’s look at the mathematical formulation right. The basic assistance game you’ve got a human being and they have some preferences which are called theta. And we’ll assume that the human acts approximately according to theta. They don’t have to be perfectly rational but there has to be some connection between their preferences and their behavior. And then the robot’s objective is to maximize exactly the same theta. But the robot doesn’t know what theta is. So it has some prior parability distribution, p of theta, over what you might be interested in. And then this formulates a mathematical game and then you can in any particular situation you can actually run the algorithm that solves this game that calculates the Nash equilibrium of the game in order to figure out what’s the policy for the robot, what’s the policy for the human. And when you solve these games what you find is that in fact the human is incentivized to teach the robot. So the human solution to this game, it doesn’t just pursue their own preferences, it actually includes teaching the robot about the preferences because that way the robot will be more useful and less likely to do something inappropriate. And on the robot side the robot has been sensitive to ask questions, to learn as much as possible about human preferences so that it can be useful quickly and avoid doing things that are wrong. And the nice thing is that these behaviors are not things that we program. These behaviors simply fall out automatically as solutions of the game. Let me give you a very simple example. It’s called a paperclip game, a little nod to Nick Bostrom’s book Super Intelligence. And in this game there’s only two things that the human might care about, paperclips and staples. And so the state of the world is how many paperclips are there and how many staples are there, P and S. And then the human has to sort of an exchange rate between paperclips and staples. So the exchange rate is theta, it could be that a paperclip is worth a penny and a staple is worth 99 cents or it could be the other way around or they could be worth 50 cents each. So think of the exchange rate as somewhere on a scale between zero and one. And then of course the robot has no idea which one’s the human prefers. So it has a uniform prior over the exchange rate between paperclips and staples. And then here’s how the game goes. So the human being goes first and they have a choice between making two paperclips and no staples, one of each or two staples and no paperclips. Right, so there’s the only three choices for the human. And then the robot gets to go next. So if the human was just to do this choice right now and the human let’s say has a value of 49 cents for a paperclip and 51 cents for a staple, then you can see that the value of making two paperclips would be 98 cents and the value of making two staples would be a dollar two. And so the human by themselves would just make staples. They have a slight preference for staples so that’s what they’d make. And now what happens is the robot gets to choose how many of each it’s going to make. And it has a choice between making 90 paperclips, 50 of each or 90 staples. Now when we solve this game, right, we get a very interesting solution. It turns out that the optimal behavior for the human is not to make the thing that the human prefers, it’s actually to make one of each. And for the robot, the optimal behavior when it sees one of each is to then make 50 of each. And this is optimal for both the robot and the human. And in fact, this is the optimal strategy if the human value for a paperclip versus a staple is anywhere between 44.6 cents and 55.4 cents. This is the optimal solution for the game. So here, what’s coming out of this game is actually a little code. It’s a code for what is the value of the human preference. What is the value of theta? And basically, we’re breaking the range of theta into three segments. And by saying 2,0,1,1, or 0,2, the human is telling the robot which segment is my preference for staples and paperclips. And so automatically, a code for the value of theta emerges as a solution for this game. So this is the first example. Second example is the off-switch problem. So in the off-switch problem, remember that the robot that’s going to fetch the coffee will stop at nothing, right? The classical robot who’s only objective is fetching the coffee. The classical robot is going to kill everybody in Starbucks. It’s going to pay 36 pounds for the cup of coffee. And it’s going to treat three weeks across a desert to get that coffee. Now on top of all that, on top of all this misbehavior, the robot is also going to resist being switched off. Because of course, if anybody switches the robot off, then the robot’s going to fail and it’s mission to fetching the coffee. And that’s one of the reasons for killing all the people in Starbucks because they might switch the robot off. And so we need to eliminate that probability. So this problem of self-preservation as a consequence of having any objective is actually one of the core problems with the classical view of AI, right? A robot that says to itself, OK, I have an objective fetching the coffee. I can’t fetch the coffee if I’m dead. Therefore I need to take any step, any preemptive step, to prevent myself from being switched off. That’s a problematic behavior, right? We want robots not to behave that way, but they are guaranteed to behave that way in the classical view where they have a fixed objective. So it turns out that once you add uncertainty to the objectives, they don’t behave that way. And we can show this with a very simple game, another one. So here the robot gets to go first. And the robot has, let’s say, these choices. The choices are either to press some big red button. The robot is actually not sure what this big red button does. Perhaps the big red button destroys the world. Or perhaps the big red button does something nice for the human. And so the robot has uncertainty about the value to the human, all pressing this big red button. And I’ve shown that on the left. And in this case, the probability distribution for the value to the human, of pressing this big red button has a slightly positive expectation. The other choice for the robot is to commit suicide. So it can say, well, I’m so worried about doing something bad that I’m just going to commit suicide. Right? Now, in this case, if those are the two choices for the robot, the robot would say, okay, well, the expected value of pressing the big red button is a little bit higher than the expected value of me committing suicide. So I’m going to go ahead and press the big red button. Right? But we’re going to give the robot a third option, which is actually to allow the human being to switch off the robot. Right? Now, you might say, well, what’s the point of that? Right? If the robot thought it was a good idea for the human to switch the robot off, then the robot could have just pushed itself off. So how does giving the human the option of switching the robot off gain something for the robot? So the answer is, what happens if the human doesn’t switch the robot off? Then the robot has learned something. The robot has learned that pressing the big red button is actually a good idea, because if it wasn’t a good idea, the human would have switched me off. Right? So now the probability distribution for the value of the big red button has lost its negative quadrant. And so the robot is now confident that it can press the big red button. So now if you look at the two outcomes of allowing the human to switch the robot off, right? There’s an outcome of value zero, and there’s an outcome which is strictly positive. And the expected value of those two outcomes is guaranteed to be higher than the expected value of going ahead and pressing the big red button without giving the human the option of switching me off. And so the theorem says that the robot has a positive incentive to allow the human to switch the robot off as long as there is uncertainty about the value of the action. So as long as the robot is uncertain about human preferences, it has a positive incentive to allow the human to switch the robot off. And so this is a core theorem that connects safety of AI systems to uncertainty about human preferences. Okay. So as you can imagine, things are a bit more complicated than I showed in that example. But here are some of the things that we’re working on. First of all, many parts of AI now get more complicated. They used to have a single agent solving a problem or solving a Markov decision process. And now the human is necessarily involved in all of these decision problems. And so they all become game theoretic. There’s a partially observable variable here, the human preference function. And so things get more complicated and we need algorithms for solving all these different classes of assistance games. We also have to sort of go through the whole, every single chapter of the textbook and redo those chapters to allow for uncertainty in the objective rather than assuming a fixed objective. We also have to deal with the fact that human beings are imperfect. That we do not behave perfectly with respect to our own preferences. There are all kinds of reasons why that’s impossible. For example, when Lisa Doll played a losing move in his game against AlphaGo, we wouldn’t want the robot to look at that losing move and say, oh, I suppose Lisa Doll is trying to lose this match. Right? No, of course he’s not trying to lose. He’s trying his best to win, but his brain doesn’t allow him to play correct moves. So in order to understand human preferences from human behavior, we have to invert human psychology, the human cognitive architecture in all of its gory complexity has to be inverted by the machine in order to get a good, reliable understanding of underlying human preferences. Another set of problems that you probably already think about is the fact that there’s more than one human being in the world. And so robots are going to have to act not on behalf of a single user, but on behalf of possibly of the entire human race. And how do you do that? Well, naively you might say, oh, we should just add up the preferences. Now if you’ve seen Avengers the Infinity War, this is basically Thanos’ theory. So Thanos has this theory that if we got rid of half the people, the remaining people would be more than twice as happy. So he has a moral obligation to get rid of half the people in the universe. And that’s why he’s collecting these little stones. Now we might disagree with that. We might think, no, actually we need a more refined utilitarian theory that wouldn’t lead to that kind of conclusion. And in fact, this question is still hotly debated by philosophers. So they worry about, this is the population size problem. There are problems about future humans. How do we weigh the preferences of the potentially quintillions of human beings who may one day occupy the galaxy compared to the preferences of people living today, the relatively few people who live today? This is an important question. And the problem is if we don’t answer those questions, then the AI system will answer them for us and possibly do the wrong thing. And we wouldn’t want that to happen. So we have to solve core philosophical problems because we are now faced with actually seeing the consequences of our philosophical solutions put into practice. Let me briefly describe one of the interesting things that happens when you have one robot working on behalf of multiple humans. The standard theorem going back to the 1970s from Hassaniy, John Hassaniy, is a economist Nobel laureate. So it’s called the social aggregation theorem. And he says that under the assumption that all of the humans have a common prior belief about how the world is going to evolve in future, if they all believe the same thing, they can have different preferences, but they will have to believe the same things. And if they all believe the same things, and the theorem says that the optimal policy, the only fact, the only Pareto optimal policies that you can enact on their behalf are ones that are basically a fixed linear combination of their preferences. And if you are a egalitarian, then that linear combination would have equal weights for everyone’s preferences. Now, it turns out that if you relax the assumption of common belief, which actually is not very reasonable, right, we all have different beliefs about how the world is going to behave. If you relax that assumption and you allow these different beliefs, then it turns out that the only Pareto optimal policies are ones where the robot is going to give a weight to your preferences depending on how well the future turns out to agree with your predictions. Which is a rather counterintuitive view, right? It says that if you happen to be born with beliefs that turn out to be true, then you’re going to have a much higher weight in the way the world is organized by the robot. In fact, it’s going to basically organize the world to suit you and nobody else. But this is an inescapable theorem. And in fact, everybody will agree to this because everyone believes that their own beliefs are the right beliefs to have. And so they’ll all agree to this because they all think the world is going to turn out the way they think it’s going to turn out. And so they will agree to a policy that looks like this. They’re basically saying, I’m betting. We’re all betting that the world is going to be the way we think the world is going to be. So I think this is a new result. And we’ve yet to figure out, we get to have discussions with the sociologist, political scientist, what they think about this scheme. And we skip over these. So the robot has to deal with people who are, for example, sadistic. So if they derive their jollies from the suffering of other people, should the robot pay any attention to those kinds of preferences? And in fact, Hassani, in his 77 people that talked about social aggregation said that, you know, there’s an exception to this theorem. I am under no obligation to help person X hurt person Y, no matter how much I want to help both X and Y, I don’t have an obligation to help sadists get what they want. Now it turns out, unfortunately, that it’s not just sadists who want other people to have lower well-being. A vast part of human preference structure actually is based on not absolute well-being, but relative well-being. I’m happy because I have a bigger house than the person next door. I have a shiny car. I have more prizes. I have more well-behaved children. These are all relative preferences. And of course, you can improve the satisfaction of relative preferences by decreasing the well-being of the people you’re comparing yourself to. And so pride and envy work in exactly the same way mathematically as sadism. And do we want robots to ignore pride and envy? We’ll let that, you know, if that’s three quarters of our preference structure, then that’s going to have a very serious impact on how the world works. And we better think carefully before we do that. So I’ve summarized these, of course, we always have to talk about on new book. We’ve summarized these ideas in a new book called Human Compatible. It’ll be out in October. It’s already on Amazon if you want to pre-order it. So to summarize, I think we can actually have super-intelligent AI cake, so to speak, and eat it in a sense that if we develop AI systems along these provably beneficial lines, then we don’t have to worry about them accidentally taking over the world or having catastrophic consequences for the human race because we misspecify the objective. But I want everyone to remember that point that we don’t know how to specify objectives correctly. So anytime you build an AI system, think about that point. Do I really know the objective? Do I really know that the machine’s optimal solution to the objective I’m stating is in fact going to be desirable to me? And I think almost always the answer is no. So don’t build your systems that way. I actually find, so we have an ethics stage. This is the Alan Turing stage. There’s a big ethics stage at this conference. I actually find that worrying. What does that really say? So imagine that we went to a civil engineering conference. And there was one stage where we talked about bridges. And there was another stage where we talked about bridges that don’t fall down. That’s sort of what it’s like. So I suppose here we’re not talking about ethics. We’re talking about unethical AI systems. And over there, we’re talking about ethical AI systems. Well, that’s broken. We shouldn’t be talking about ethics. It should be actually just part of what it means to build a good AI system that it doesn’t do things that make people unhappy. That ought to be obvious to you. So there’s lots more work to do. Lots of theory, lots of inverse psychology, practical systems actually starting to build intelligent personal systems using these principles to explore how well it actually works out what kinds of new algorithms we need. And then, as I mentioned, some of these long-term philosophical problems we have to start thinking about how to solve them. There are two other problems with super-intelligent AI that I haven’t talked about. One is misuse. So this guy, Dr. Evil, probably doesn’t want to use probably beneficial AI. He probably wants the old kind. So how do we police the future where AI technology is available to anybody to use for their own purposes? And then the other problem is not misuse, but overuse. So how do we protect our society, our civilization from the temptation to basically turn over everything we know and everything we know how to do to machines? Because it’s just so much less effort. But that would result in an irreversible infealment of human civilization. And we have to think about this. This is not a technical problem. This is a big problem for our culture. And of course, many people would argue it’s already happening. But it’s going to get much, much, much worse when we no longer need to put our civilization into the minds of the next generation because we can put it into the minds of machines instead. Thank you.

AI video(s) you might be interested in …