AI Just Solved a 53-Year-Old Problem! | AlphaTensor, Explained
I’m gonna show you what I think is the most exciting breakthrough we’ve made this year. I understand that generating images, and text, and code is exciting, but this… All of our tensor… This can change everything. And although all of our tensor is about matrix multiplication, and that’s what everyone is talking about, I think this is something else. Something we don’t limit potential. But let me show you first how we got here. Let’s say we want to solve this simple equation, x squared minus y squared. This problem requires two multiplications. x times x, and y times y, and one subtraction, this term, minus this term. But I wanted to imagine just for one second that multiplying numbers is very expensive. So the fewer times we do it, the faster we can actually solve a problem. Fortunately, high school algebra has a trick for us. This equation is the same as x plus y times x minus y. But now, instead of two, we have a single multiplication. And that’s pretty clever. We wrote a slightly different equation that gives us the same result, but we can now execute faster. Now, I want you to think about the learning here for a second. The current computations in our deep learning systems, they’re all based on linear algebra. At the core of it, behind the scenes, deep learning boils down to many, many matrix multiplications. And every time we multiply two matrices, we have to run a bunch of multiplication operations. And that’s slow. So I have to wonder, can we extend this same idea? The principle of replacing multiplications with a simple operation and apply it to matrix multiplication? In 1969, this guy, Volker Strassen, a German mathematician, showed the world that the way we’ve been multiplying matrices was not optimal. Up to that point, the best solution to multiply two matrices is the way we teach in school. Let me show you a quick example of how we multiply two matrices. This is matrix A and matrix B. And we want to find matrix C, the result of multiplying A times B. Notice how we compute every value of matrix C. Let’s count how many multiplication operations we need in this example. One, two, three, four, five, six, seven, eight, eight multiplication operations whenever we multiply two two by two matrices. But what about three by three matrices? Notice how we come up with the value we’re living here. We multiply this row by this column and you can see that we need one, two, and three multiplications just to compute one value out of nine. So we will need a total of 27 multiplication operations to multiply three by three matrices. In general, when we use the naive algorithm, the one we learned in school, the number of multiplication operations will be equals to the size of the matrix to the power of three. So eight operations for two by two matrices, 27 operations for three by three, 64 for a four by four, and so on. So stressing came along and said, stop the madness. Here is a faster way of doing this. He created an algorithm that starts with the initial matrices and then generates these terms that he can later use. A stressing came up with these equations that he uses to compute the final result. This is an example to multiply two or two by two matrices. Notice how he only needs seven multiplication operations instead of eight. One, two, three, four, five, six, and seven. Trussing’s algorithm gets better for larger matrices but here comes the blood twist. So it’s been 53 years since Trussing came up with his algorithm. Which by the way, inspired many researchers to study matrix multiplication. So 53 years later, and we still don’t know if there is a better way to multiply matrices. Even small matrices, like three by three, and this blows my mind. A three by three matrices, not even that large, and we don’t know if the way we multiply them is the optimal way. That’s where artificial intelligence comes in. The thing that makes DeepMind unique is that DeepMind is absolutely focused on creating digital superintelligence. In 2017, DeepMind introduced Alpha Zero. Now, here is what’s fascinating about it. Alpha Zero taught itself how to play chess. Shogi, which is the Japanese version of chess, and Go. But it doesn’t only play these games. It wins them. The DeepMind system can win at any game. Two and a half ago scores another win. But why stop with Alpha Zero playing games? What else can we use it for? Of course, DeepMind took matrix multiplication and turned it into a single player game that they called TensorGame. And the system taught itself how to find new, previously unknown algorithms. There’s something important here to understand. Go is an extremely challenging game for artificial intelligence because the number of possible moves we have at every step of the game. Well, multiply matrices blows go out of the water. At each step, there are approximately 30 orders of magnitude more possibilities to consider. So this is a much more difficult game to play. And still, Alpha Tensor’s results are amazing. This here is Alpha Tensor’s paper. And I want to show you this table that compares the results of Alpha Tensor with the state of the art methods that we had so far. And on every single case, Alpha Tensor found an algorithm that either matches or improves on the current methods that we as humans created. For example, multiplying a four by five times a five by five matrix requires 100 multiplication operations by using the naive traditional method. Humans found an algorithm to dust the operation the same multiplication with only 80 multiplication operations. Alpha Tensor cut that down to 76. And that’s cool, but there’s one more thing. So far, I’ve talked about reducing the number of multiplication operations as the ultimate goal. Here is what it gets interesting. DeepMine adjusted Alpha Tensor’s reward. So we went from trying to find the fewer number of multiplication operations to trying to multiply get the final result in less time. You know what that means? Alpha Tensor can now find the optimal way to multiply two matrices for a given hardware. So the algorithm on one GPU might look completely different than on an auto GPU. And that is nuts. Alpha Tensor is an artificial intelligence system that with a little bit of our health taught itself how to find new algorithms to multiply matrices. I want you to think about the implications here. First, matrix multiplication is a foundational building block in machine learning. It’s literally everywhere. Any improvements. It doesn’t matter how small they look will have a huge impact. That alone makes Alpha Tensor a big deal. But I’ll be honest with you. That’s not what excites me the most about this. Think about the possibilities of having one system discovered new algorithms. That’s a game changer. But my question is what comes next?