#94 – ALAN CHAN – AI Alignment and Governance #NEURIPS

Alan Chan is a PhD student at Miele, the Montreal Institute for Learning Algorithms, supervised by Nicholas DeRoux. Before joining Miele, Alan was a master’s student at the Alberta Machine Intelligence Institute and the University of Alberta, where he worked with Martha White. Alan’s expertise and research interests encompass value alignment and AI governance. He’s currently exploring the measurement of harms from language models and the incentives that agents have to impact the world. His projects have examined the regulation of explainability in algorithmic systems, scoring rules for performative binary prediction, the effects of global exclusion in AI development and the role of a graduate student in approaching the ethical impacts in AI research. In addition, Alan has conducted research into inverse policy evaluation for value-based sequential decision making and the concept of normal accidents and AI systems. Alan’s research is motivated by the need to align AI systems with human values and his passion for scientific and governance work in this field. Alan’s energy and enthusiasm for this field is infectious. I caught up with Alan at New York’s the other week. He was manning the desk talking all about AI alignment. The desk was open all day for folks to come up and have a conversation and this was my short conversation that I had with Alan. Enjoy. Alan, it’s the mic. What’s your name? Alan, pleasure to meet you, my friend. Hello. Hello. How are you doing, you? Yeah, you can give me the mic if you… I’m fine. I’m fine. I’m fine. I’m fine. It’s about… I don’t want to be too faced. I’ll be honest with you. I am intuitively skeptical of alignment. Can you keep that camera up? I’m half open. I’m so sorry. Yeah. So what do I need to… I know the basics about, you know, instrumental convergence and I’ve read Boston and I’ve recently read a book I think called The Hitch Hike is Guide to Rationalism. I’ve got some of the basics. Yeah. So, you know, sketch it out. I’m interested in your areas of skepticism, though. Okay. So, do you want me to put the camera up closer because… The problem is it’s got such a horrible lens on it. You need to be quite far back because I want to get… I want to get all of this in. Right, man. Just a few people walking fast. I know. You just need to like… Can you move on? Yeah, just get your elbows out. I know. It was such an expensive lens. You know, like when you choose a lens, you have to get the right one and that was the one, you know, un-pore. I don’t go paid for my YouTube. Anyway, answers? Well, you know, one or two… We’ve got one or two sponsors every now and then, you know, yeah. We’re struggling. Maybe if he’s all in the 90s. Yeah. Okay. Okay. Okay. Okay. Yeah, yeah. So, just to… I guess, calibrate myself. What are your, you know, skepticism’s? Okay. So, I’m a fan of Francois Chulay. And Francois Chulay is well known. Do you remember he wrote an article called Something Like The Employability of the Intelligence Explosion? So, most of my skepticism is along the lines of, I think that there are all sorts of bottom-ex in large systems. I mean, if you look at Google, he sites as an example of a super intelligence. And it’s, it’s hidden scaling factor. And the reason for that is because it’s externalized and distributed. It takes a long time to get a pull request done. It takes a long time to get code checked in through. There are bottom-ex everywhere, even with, you know, the kind of hardware that we use for computing. There are bottom-ex in creating the silicon and distribution and so on. So, my intuition is that there’s a, there’s a scaling bottom-ex. Yeah. I think this seems kind of reasonable. I guess my take is that certainly, there do seem to be bottlenecks with systems that we’re building right now or have built, like Google, for instance. But it seems a bit more plausible to me that AI’s might be able to escape these bottlenecks. So, issues with communication, for instance, at Google, right? And it seems like an AI would have a much easier time of communicating with different copies of itself around the world, let’s say. I’m achieving a certain volume. I guess we can get into, like, more nitty-gritty bottlenecks. But, yeah, did you have, like, sort of more core, like, sort of skepticism? Well, another thing is, so, again, reading that book, and again, I’m sure it’s an extremely low resolution view into what you folks think. But a lot of it comes back to the definition of intelligence that some of you folks choose to use. And it’s based on a single principle, which is a unifying principle, which is this idea of a rationalist agent making a trajectory of, you know, kind of, Bayesian optimal decisions. And I think it’s very elegant. I think it’s beautiful. I mean, you know, like, AIXI as a mathematical theory is wonderful, but I think it’s intractable. And I subscribe to different views of intelligence. So, you know, using this AIXI conception, you could argue that something like Alpha Zero is super intelligent, because it forms so well at a particular thing. And I think task-specific skill is not the same thing as intelligence. I think intelligence is the information conversion ratio, the ability to be flexible and to very quickly given a small amount of information, experience, and priors, do something completely different to what I was trained on. So, and there are other views of intelligence as well, like, based on behavior and function and capability and so on. But, I think most of it traces back to, I mean, certainly instrumental convergence. It traces it back to this idea of, you know, having this absolute will to do one thing, at the, you know, potentially, let’s kill all the humans on the planet in pursuit of doing this one particular thing. Yeah. So, I guess there are two things here. The first thing you mentioned was this idea that, you know, when we think about the classical super intelligence argument, it sort of considers agents to be consequentialists, pursuing some act. The second sort of thing you mentioned, it was about intelligence being more than just like neo-task performance. Yeah. So, I guess on the first thing, I think I’m pretty sympathetic to what you said. So, I guess like, I do agree that the story for machine learning systems that we’re building nowadays is a little bit more complicated. We’re not building like pure utility maximizers. But, I guess I’ve a lot of uncertainty about like, to what extent, like, how far do we need to go to building pure utility maximizers to get to some level of danger? We are building some time. So, I guess we’ll summarize learning systems like, you know, GPT-3, right? It’s not totally clear, but this is a, you know, agent that is pursuing some reward. When you train it with reinforcement learning of human feedback, though, you know, it seems a little bit closer to this because you’re literally training it to maximize some reward in some unspecified way. With reinforcement learning systems, like, full-on reinforcement learning systems, I think you get much closer to the classical threat and model in Boston. I think that’s sort of, you know, the type of worry coming from people who worry about your other control artificial intelligence. So, yeah, I think it is an open question, you know, to what extent do we get the types of misgeneralization, like, failures that actually occur? Like, you know, I think that ML community has done a lot of work in the past few years in trying to formalize power seeking, in trying to find examples in which, you know, the goals we specify at training time don’t generalize, you know, to test time. And like, I think, you know, this work has been pretty good at like showing that this thing can actually occur in like simple environments, and like, there might be more theoretical reason for us to believe that things would occur in more complex environments, but of course, we need an evaluation to figure out, you know, exactly how exactly is this going to occur? And like, you know, it is going to be as easy to affect when we scale models up as, you know, with models right now. Yeah, well, you said some really interesting things, I mean, I would touch first on the reinforcement learning for human feedback. And I worry that something like that is going to make us fooled by randomness because we anthropomorphized models like GPT3. And I notice when I use GPT3 that I am fooled by randomness, it’s absolutely incredible, right? It’s really, really incredible. And I only notice failures when I’m using it non-interactably as part of a software stack or in a workflow engine or something like that. And this because we cherry pick quite a lot, but I agree that it has meaningfully, you know, it’s made it almost deterministic when you greedily sample from GPT3 that it’s doing something that I want it to do, which is very interesting. But I could cynically go back to cells argument, you know, which is that basically in lieu of biology, computers don’t have any intentionality, right? And it’s impossible to replicate an intelligence and silico. That’s an extreme argument that you could make that argument as well. I also wanted to touch on your, there’s a certain language that I hear from your part of the world. And a lot of it is words like utility and it’s sort of words around the, like related to, you know, economic theory and utilitarianism and consequentialism. And I’m also quite interested to hear your take on what a rational agent would do in respect of moral reasoning, you know, over long trajectories. So there’s a few things there that I’d like to understand. Yeah, so in terms of the terminology, it’s pretty interesting. I guess the standard framework in machine learning now is to frame things, maybe not in terms of utility, they call it loss function, but it’s kind of the same thing. And I think really this comes from the field of optimization, which like really got headway, you know, in the 20th century when we’re thinking about, okay, you know, how do we design economies and like, how exactly do we like design systems to achieve some goal effectively, right? And there, I think it’s very natural to see, oh, like, let’s think consequentially about how to design these systems. And maybe this is a problem, right? Because I think a lot of the risk from AI comes from having like consequential systems that pursue some goal without caring about anything else, any other side constraints, like, okay, like maybe, like, maybe these humans actually have values that are like, with preserving, right? Like, yeah, what was the second thing you mentioned? Well, could I touch on it? First of all, I think consequentialism is a risk for human reasoning as well as artificial intelligence reasoning. Now, I’m not going to say state that I have an opinion on this, but I know one of the criticisms of some of the long-termist community is this notion, or I mean, certainly folks like Bostrom, I think, believe that the universe has made of information, which leads very quickly to this simulation hypothesis, which leads very quickly to considering the utility of future simulated humans on other planets and so on. And I can’t remember the exact number, but I think he did some kind of calculations which led to a very, very big number, let’s say, times 10 to the power of 59 or something. And because that number is so much bigger than the number of humans on the planet now, it logically leads to the reasoning that we should care more about those lives than these lives. So, but anyway, about the utility function as well, one of the issues I have with, let’s say, reward is enough and reinforcement learning in this rationalist conception of intelligence is that the reward function would necessarily need to be so complicated and we live in a complex system. So, you know, the problem is with any phenomenon of a macroscopic complex system, every time you create an abstraction or some kind of way of understanding that system, you exclude almost all of the truth of that system, which means any utility function would really struggle to capture the dynamics of that system. I agree. This is why I’m quite pessimistic actually about alignment. I think if we’re building consequentialist type agents, it seems pretty hard that we’re going to be able to get them not to pursue some kind of power seeking. So, this is why I think, you know, the most feasible path to getting to a sort of safe world is coordination where we come together and we agree, okay, now we have like some sort of evidence from researchers that our AS systems do XYZ and like YZ are particularly dangerous and like conditions ABC leads to YZ. So, with this evidence, we agree, okay, let’s not build systems in this way. Right now it’s sort of like the wild west. We don’t really know what’s going on. A lot of people have this intuition, including me that like, okay, like we’re building kind of wild systems, let’s maybe like slow things down and try to understand a little bit. Whereas like, you know, I think the broader sort of ML research community right now is let’s like forge on ahead, which I think like, you know, sort of makes sense actually because I think there’s a lot of potential upside from AI and a lot of people see that and want to like help solve problems in the world, right? It can be addressed by AI. And it’s just that I guess, you know, my intuition is, okay, well, you know, it seems like there’s a lot more uncertainty from me right now about the potential downside. It could be like really huge. So, I want to work to try to figure out what that could be. Yeah, and another thing as well, and I think Elias Yutkowski’s made this argument, it’s that we don’t really know how close we are to artificial general intelligence. It might be like, we are 98% of the way there and we’re very, very close to something, you know, disastrous happening. So I can kind of get on board a little bit with that argument. But the problem I do have is things like info hazards and things like what I perceive to be almost technoluditism or paternalism is it might really hinder our progress as society if we kind of like stop doing this kind of fundamental reset. Yeah, yeah. So I’m like not that on board with the technoluditism and with like some of the more extreme stuff that’s been proposed from like long-termism community. I guess, yeah, you said something earlier about long-termism. I guess like if you were to ask, if I were long-termist, I’m not like completely sure. Like I think I am on board with the idea that equal lives, like that lives in the future matter equally, but I think because of uncertainty there is maybe something implicit discount factor like with our actions, right? But even like sort of supposing long-termism, I guess I’m not 100% a consequentialist. I think like you know there are norms and values and like rights that like we should probably respect. I am like not a very good calculator of like, you know, as you said like right, like the reward function is just like so so complicated. Yeah. I’m going to need some like really some of the heuristics to help guide me in like what I’m doing. Indeed, indeed. Well, all I want to say is I’ve really enjoyed this conversation. I appreciate it so much and thank you very much to me. Great to meet you. Thank you so much.

AI video(s) you might be interested in …