PyTorch Mobile on Android using C++ (PyTorch Conference 2022 Breakout Session)

Yeah, my name is Joe Bowser. I’m with Adobe. We’re going to talk today about our explorations and using PyTorch mobile on Android. We’re using C++. I’m with the Sensei on device team, our Sensei ML and Furns and efficiency team at Adobe. We go across teams and work on features. So just to give an outline of what we’re going to be talking about today, first it’s going to be about PyTorch at Adobe, what we’re doing with PyTorch, then why Android, why is this talk about Android, why is this talk about C++ and some findings, and why Furns matters, namely on device and Furns, which may be review from the talk earlier, but this is more from a user, from like an end user context. So first of all, PyTorch at Adobe. So we ship a lot of ML models in production today in various products around Adobe. So various features of Photoshop, like neural filters, our ML powered features that we ship, and a lot of these features are developed at Adobe Research or within or with researchers that are assigned to various product teams. And when we did a survey, we found that the majority of those researchers are using PyTorch. We don’t dictate what our researchers use at Adobe. They have a relative freedom to use what they feel like, you know, get them publishing do their jobs. So we have these models that are being developed in PyTorch, and we have to deploy them in Windows, Mac OS, iOS, and Android, as well as on the web. So when you look at this data on device and Furns in 2022, there’s desktop, and people generally don’t think about desktop, but when you think about a non-technical user, they’re not necessarily going to be deploying an entire Python environment and PyTorch and dealing with drivers or that under desktop. They just want to install Photoshop and have it work. And they want to have it work regardless of whether they’re running an Nvidia GPU or an AMD GPU or no GPU. So that’s why we leveraged the existing tools that are there, and that’s why we made the decisions to use things like CoreML, Mac OS, at the time, Winemail and Windows in Onyx runtime on Windows. In a mobile-same situation, you install the app, you want it to run right away. Again, CoreML and TFI and where you, Amber looked at PyTorch mobile. And on the web, we’re looking at various technologies. It’s still experimental. We’re still looking at the stuff and evaluating it, but there’s TFGS in Onyx runtime. This is sort of generally how things are for application developers looking to deploy ML on devices today. So why is this talk about Android? Now, when you’re actually deploying things on Android, you have numerous choices unlike with iOS where you’re going to be using something that’s using the CoreML backend. It could be PyTorch mobile or could just be CoreML directly. And exporting out of PyTorch to CoreML works. And when you’re deploying on Android, you can deploy on what’s out there today like TFI flight, but you could also deploy with PyTorch mobile or Onyx runtime. And each of those choices have upsides and major downsides. And if you’re just going to go off like Google recommends, you may actually find yourself facing some serious downsides, namely of model conversion and also with performance as we found. So the conversion disconnect is this as we talked about earlier where you’re going from the PyTorch ecosystem. So you’re going from the PyTorch ecosystem and you have to convert it to Onyx, which is this format that’s supposed to allow your model to be portable everywhere. Now, when you actually take that model and try and actually port it to places like the TensorFlow ecosystem, things fall apart quickly. And this is partly to do with the fact that Onyx dictates a certain memory management for computer vision tasks. And when you’re dealing with TensorFlow, especially PF Lite, they demand the opposite. And there’s also the whole situation with different operator support between Onyx, PyTorch, as well as with TFI. So while you’re able to get the model out, Onyx, getting it into TFI is really hard. And this is the thing that really slows down Android development as well. So yeah, so what if we just don’t? What if we instead of trying to do the conversion, just deploy with TorchScript and just run TorchScript in Android? So that’s the promise. And that’s sort of one of the deliverables of PyTorch mobile is to actually be able to have PyTorch so you can actually do operations on tensors in C++ and actually be able to run TorchScript. So staying in the same ecosystem means that you can leverage everything that, not everything, everything that PyTorch has to offer in C++, which is actually quite a lot. So that’s actually something that’s really important to keep in mind as well. And also, it does help that the C++ API is similar to the Python API. So if you’re familiar with one, it’s not like a huge shock to use the other. So then, because we’re talking about Android, Y C++, why are we talking about C++ on Android today? Because typically whenever you’re developing applications on Android, you’re going to be using something that compiles to Java bytecode, like Java or Kotlin. So, I mean, the reality is that it will be loves C++. We’ll compile C++ literally anything. And we have a lot of C++ talent. We have a lot of code that we have that’s written in C++, that we don’t want to rewrite another language is because we just couldn’t. It would take too long. And the thing is that we also want to make sure that we can take that C++ and have it portable across Mac, Windows, iOS, and Android. And you can do that in C++ and also port it to the web through POSO. And you can do that. And that’s something that you can’t do as easily with other languages, especially in the iOS space. I’m trying to get it to work with Objective C and Swift. And also, there’s the other thing where you can’t, when you run C++ on Android, you have to deal with the Java native interface, the JNI layer. And if you’re going across the JNI layer over and over again, not only does that affect speed, which doesn’t affect speed a lot, but it can out up if you’re doing a lot of operations. And it also affects stability. You can have bugs, you can have objects that sort of drift if you’re not careful. And it’s easy to keep as much of your business logic, like the actual things that are doing your image processing and running your models and doing all that in C++ as much as you possibly can. So that’s our logic. And pre-imposed. So yeah, not everything is image net based. So there is some pre-imposed in the various Java based platforms. But it’s mostly, okay, we have something trained on ImageNet, do normalization. It’s been done numerous times. But I have two examples that I’m going to show that actually don’t use ImageNet. You do other things. Like one is grab a, one is template in a different, instead of RGB, C-R-C-B, and the other one is just multiplying and dividing a tensor. And then the other thing we have to concern ourselves about is copying stuff. So what we do for a pre-imposed, when we’re actually up, a pre-imposed feature, because remember, when we don’t have PyTorch, we don’t have PyTorch. So we have to grab something else. And unfortunately, that means openCV at the time. I’m not going to say that you think that about openCV because it’s great for having the APIs be similar. But it does complicate things because then we have to take that and adopt it for our own internal stuff, which is different. Or if someone’s choosing to use halide in C++, or we have to use halide, I would have used halide logo, but I couldn’t find it. So sorry, guys. Yeah. So, and then there’s the knowledge gap. So one thing that we have to deal with at Adobe is that we have our researchers. And they’re really good at writing Python and writing papers, doing a whole bunch of PyTorch dev. But they’re not so good when you ask them. When they ask how can you help you, like, do you know C++ code? Can you, like, they’re just, their eyes will glaze and you’ll just, all right, cool. And then there’s also the reality is when you’re trying to tell software developers, like, I myself, I’m actually, like, my background isn’t ML at all. It’s mobile development. Like, I come from, like, open source and mobile development for, like, a decade before I started with ML at Adobe. And, like, I didn’t know what the difference between planar and chunky or interleaved and non-air-leaved. And this is, like, a major concept that’s actually really important, especially when you talk about model conversion actually deploying the things and as well as performance. And this isn’t universal nomenclature, like, I’ll ask an interview, for example, hey, can you tell me the difference between planar and chunky? And then they’ll be like, what? And I learned that from the gaming industry. A lot of other people talk about this in different contexts. And that’s why it’s important to have everyone on the same page and ideally using the same tools. So, yeah, and ideally you want things to be, you know, to be similar. So this is some Python code in OpenCV to do some pre-imposed. And that’s the same thing. And in PyTorch, right, you can actually do something similar to this. Now, we’re going to get into the next thing, which is why does this matter? Why? Well, what did I find when I actually compared these frameworks at Adobe? Remember, I’m trying to figure out whether we should adopt PyTorch mobile and ship it in a product and what my recommendations are and whether it’s a good idea or not. So, what does this matter? And it matters because PyTorch is PyTorch. It’s easier to work with. And I think it handles VP pretty well. Like, you don’t have to do this sort of weird clunky delegate thing that you have to do with other frameworks, where you’re just like, where you have to set up a session and load something and it may or may not exist. You can just sort of, it’s sort of more like how it works with CUDA, where you could just ask PyTorch, hey, does this work? And it’ll tell you if it does or not. And that’s actually really great because you can also know where that tensor is located in memory. That’s actually critically important because copying between CPU and GPU memory is one of our bottlenecks. And we can actually, and if we actually know where it is in memory, we can actually grab it off the GPU and keep it on the GPU when we’re doing invulcant. And that’s huge. So, again, this is an example of like setting up a tensor in PyTorch mobile and setting it to channels last and then doing inference, like calling forward. If you see any basic model in PyTorch, it looks very similar, right? So, that’s the thing that’s really important with this. Is that the similarity, the fact that even though it’s a different language, the same core concepts are consistent. And that’s one of the big things that I think makes a really promising tech. So, here, I’m going to talk about use cases. So, this is where I’ve actually wrote some demos and tried to get things running head to head. So, I’ve actually taken the big three that are on Android and put them head to head. So, this, so, there’s actually tutorial on how to convert to Onyx on the PyTorch site. And it has this really simple 5 or 6 layer, I believe it’s 5 layer super res model. And it just scales the cat image up. So, the beauty is that I can convert this to all three. So, I can actually do a proper Apple samples comparison. And which is actually really hard to do because of the conversion step and actually getting a model from PyTorch everywhere. Which is another reason why PyTorch mobile is great. Actually, is that we can, well, here we can do in the Apple Alps comparison. And this is just, I think it’s literally just doing convolution in pixel shuffle or depth through space. And as we’re seeing, PyTorch, like the lower bars are better and are color coded based on branding colors. PyTorch mobile easily beats TensorFlow. Like Tiff Lite on Android. Unfortunately, I couldn’t do Vulcan because this model only has one channel. But okay. And it’s just barely beaten out by Onyx runtime. This is the only time we’ll see Onyx runtime because I couldn’t get it to run with the other models. And this is mobile in that V2. This is probably the most common architecture deployed in mobile today. And I gave it 4 threads. I probably could have gave it 8 thread. That would have been 4 threads is generally standard for Kip. And on NNAPI, they basically run the same. There is some setup of PyTorch. But once you run the setup, they run the same. There’s less than a maybe half a microsecond difference. And this is on 25 runs. I’ve done 100 runs. Same sort of results. And yeah, with GPU, I couldn’t get this to run together on GPU, unfortunately, because each frame or competes. But this is something that you could download and build. And once you get the models in, get it on your phone and see straight up in real time. Yeah. We’re actually like, it’s actually better. And the final one is, so this is more like a demonstration of how we would use something that’s similar to what we would use at Adobe. Of course, because our models are proprietary, I can’t actually deploy them in my demos. That’s not allowed. So I did find some random gann off the internet. So I found anime gann. So anime gann is, you know, a gann model. And it takes six seconds, roughly, on my pixel six. And it took like three seconds. Actually, no. Maybe three seconds, four seconds on on X-Front-Him. So on X-Front-Him was a little faster, but not really. And I couldn’t get it to work at all in the other framework. It took 30 seconds. Overheated the phone. Had a gray image. It can’t run an NNAPI at all due to memory constraints. So like when you’re talking about more complicated models, GPU becomes more important. But something else that you have to remember is that you need your GPU to do things like draw your UI. So if I was actually doing a real app, I’d probably do CPU in the background. Or I’d be leveraging the neural API more. Yes, because I don’t want Jank in my UI. And yeah, again, this was just something I did. Sort of to more demonstrate what a typical Adobe model would be. Like something that you would find safe Photoshop. This isn’t a Photoshop model. This is a random model I found on the internet. But that has the point of that. So in conclusion, PyTorch performance numbers are comparable to competitors if not beating them straight up. And the main advantage, the thing is it’s not just about performance, it’s not just about conversion, but it’s actually having PyTorch there. So if you need to actually do some math on intensity, we can. If you need to actually like do some things, we’re chaining it between two models you can. And it works in a coherent consistent way between Python’s API and C++ API. And that’s the thing that I find the most valuable as far as getting velocity. And being able to explain exactly what you’re doing. At least there’s someone who has to port models and actually take stuff and move it to device for productization. And that’s actually something that’s really hard today. Because sometimes your memory layouts are weird and sometimes you can’t do the pre and post processing step. Or it’s really expensive. So that’s pretty much it for what I have. So here’s the links to the repositories. This is my personal repo that has these demos. This is a license Apache 2.0. All open source demos that are done through Adobe are generally licensed under the Apache software license. Do you know how we open source things at Adobe? To get the model, there is a Jupyter notebook. It allows you to convert it to CPU and GPU. You need to have a local build of PyTorch installed and it needs to have the Vulkan back and enabled. And it can’t be a docker. It actually has to be a legit full back end. I have to actually file and fix that because I actually might have a fix for that. But I can. So that’s the great thing about being in the perfect foundation. And that’s it. And with that, I’m going to pass it off to Recune now.

AI video(s) you might be interested in …