Two Minute Papers: Ubisoft’s New AI: Breathing Life Into Games!
In this AI video ...
Dear Fellow Scholars, this is “Two Minute Papers” with Dr. Károly Zsolnai-Fehér. Earlier, we talked about AI-based techniques that can learn to clone your voice, and then we can perform text-to-speech. So, it would learn this… I think they have to change that. Further details are expected later. And then, the AI would generate this. This is a voice line generated by an AI. The first law of papers says that research is a process. Yes, this means that we can write something and the AI says it in our voice. So that is text-to-speech. But get this. What about speech to gesture? What is that? Well, scientists at Ubisoft had a crazy idea. They said, let’s create a dataset of characters where the AI can see gestures for agreement, disagreement, being relaxed, neutral, scared, and more, learn from it, and apply it to new, virtual characters intelligently. Sounds good, right? But not so fast. Ubisoft is not the first to try this idea. Here’s a technique from just two years ago. Look… I check my phone all the time. I hope that somebody has reached out. There is something here, but we are humans, at least most of us anyway, and our eyes are highly attuned to the gestures of other humans, which means that if even the smallest part of the animation is off, we will immediately notice. And I think every single one of you fellow scholars indeed noticed that something is off here. And based on the quality of the animation that is required today to keep the illusion up, I am not sure if this is coming to fruition anytime soon. Just look at these examples. We are so far away. But in any case, let’s have a look at the new technique together. The concept is the same. First, in goes a speech sample, and it will be able to generate long sequences using a style of our choosing. Like this. Society has left behind what nobody listens to you anymore. Nobody thinks about you anymore, and yet it is you who populate our shopping malls. It is you. That looks incredible. Let’s compare it to the previous method. I check it. I check my phone all the time. I hope that somebody has reached out. Yes, my goodness. A night and a day difference. Such incredible progress in just two years. Yes, this is from only two years ago. And that would already be pretty cool, but it gets crazier than that. Much crazier. For instance, we can plug in a new speaker, which the AI hasn’t heard about yet, and this happens. The biggest fake smile to match his, Astamau’s day was. Is the answer? Perfect. This is really good, but that’s still nothing. He generalized to not only new speakers, but hold on to your papers, in new languages too. Listen. And the list of features just keeps on going. For instance, we can also exert some artistic control here. If we feel that the hands are too high up, we can lower them. Or we can even ask for more or less heat movement during the monologue. And I have to say, it does a lot of things really well from the lowest energy gestures up to the highest. And it still doesn’t stop there. It is not only better than the previous technique. It is not only more controllable than the previous technique, but it is also about seven times faster. And even better, it is done with the neural network. That is significantly simpler than that one. So, how many hours of these gestures did the neural network get access to learn this? How big was the training set size? Actually, it requires very little information. All it was given was about two hours of these gestures. And it is not copy-pasting, but it can continue these gestures for a long, long monologue seamlessly. And as a cherry on top, even some creative control is allowed. So, these amazing virtual worlds are going to be even more full of lifelike characters. And with this work, this process is going to be even easier and more accessible for all of us. What a time to be alive! Thanks for watching and for your generous support, and I’ll see you next time.