TorchRL: The Reinforcement Learning and Control library for PyTorch

Hello everyone, my name is Vincent and I’m developer for Torcharel, the reinforcement learning and control library for PyTorch. So when we started this effort of developing Torcharel, we had a look at the existing ecosystem of our libraries that are using PyTorch. And what we realized is that some of them were heavily focused on things like production or some subfield of RL-like model base and things like that. We also have a lecture fact that some libraries were absolutely great and used by very white community but they were developed by people that had left the field and were left and maintained. So we saw that as an opportunity for us to come up with something new with a consistent effort in developing a RL library that we’re using PyTorch with a update that was up to date on the latest features of the library and also a library where users would be able basically to escalate the issues that they were encountering in RL to PyTorch core and solve those issues in the core library. So the breadth that we wanted to cover was basically everything that was from newcomers to skilled RL researchers but also everything that was from running base lines to developing new algorithms with the library which is not an easy task. The first thing that I was told when I started working on this is that people wanted the library and other framework so there are many ways to interpret the sentence but one thing that immediately came to mind was that people wanted to have a library that was very modular with standalone components that you could basically pick without using the whole stack of primitives that the library was providing but also components that you could easily swap in the algorithms you were developing. Also a library where the syntax would be totally unsurprising to our practitioners because sometimes what you can see is that people have renamed something into something else to see the purpose of the library. And also we wanted to have an extended functionality so to privilege breadth or depth and a library entirely returning Python or almost entire returning Python such that it was easy to hack for practitioners. What we didn’t want is to have a library that was just a collection of algorithms or just a library that was an extension of gym like many libraries I think are in the sense that they provide you something on top of gym to basically train our algorithms because you wanted to satisfy also users that were not using gym that were using other simulators are not no simulators at all. We tried to have a core dependency on PyTorch and PyTorch only and we support many other libraries such as gym or gym control or habitat or jumanji or Brax or whatever. We really tried to integrate all of these into a unified API such that users can easily swap from one simulator to the other. We tried to focus on two main pillars. The first one is efficiency and that goes along the line of everything we’ve talked about so far with PyTorch core but also very specific modules that are dedicated to RL such as an efficient repair buffer that can work in distributed settings, vectorized environment and transforms or advantage computation and things like that. We also wanted, as I’ve said before, to focus on modularity. So to have generic module classes with very fuel level of the abstractions, environments, modules, models and basically all those little components that you can play around and put together into a unified algorithm. To cover spaces that were from model-based RL to model-free RL, on policy, off policy, offline, online, basically try to cover as much of the RR space as you can from the beginning. So modularity in our opinion comes in two flavors. The first is that you would like to be able to swap components with each other and the second one is that you would like basically to be able to adopt one single component without adopting the whole library. The first thing to note is that reinforcement learning is not about the media such as envision, you have images, in text, you have text, etc. But RL is about the algorithm. It’s basically about how you’re going to have a policy that plays with the environment to collect data, what are you going to do with this data when you pass it through your loss function, etc. So it’s really about the interaction of all those spaces together rather than the media that you’re trying to work on. And it’s very difficult to come up with a unified API because for instance if you think about a policy, so a policy in RL is something that usually reads an observation and outputs an action. But sometimes your policy is a little bit more complex, it’s going to read an observation and for instance a recurrent network state and output an action in the next state. And also sometimes your policy is going to return something else in an action, for instance an action and the probability of that action. So first you have to consider that, that your policy is something that can be really different from algorithm to algorithm. But also your policy has to be sort of multimodal in the sense that you need to execute it in training or evaluation mode, exploitation versus exploration and all these kinds of things. So basically you have a policy that can behave in very different ways and also that should be able to digest very different kinds of information. So this is a rather loaded slide but basically the core idea of the solution we found to those problems was something that we call Tensodict and Tensodict is basically sort of a data carrier for Torch-Abel. So all of the components in Torch-Abel read and write Tensodict and that’s basically the only thing that you need to buy when you want to start using the library. And it makes our life much easier because now you can have for instance an environment that is not reading a specific number of features like an action and an error and an state or whatever to output a different number of outputs. You’re sure that your environment is always going to read a Tensodict and output a new Tensodict. Same thing with the policy. The policy is just going to write a Tensodict and output a Tensodict. The repubofer is going to read and write this Tensodict. So basically a Tensodict, what is it, is just a dictionary with extra Tensodict features that allow it to basically execute shape operations such as reshape, view, permute and things like that. You can stack Tensodict very easily together, you can unbind them, split them, do many things that you would do basically with a Tensor but you can execute them on a Tensodict that contains almost that contains only Tensors. So first Tensodict was part of Torcharell and our early users were pointing to fact that it was something that they would rather see as an independent library because they were hoping to use that in many other projects like self-supervised learning and things like that. So we open source Tensodict as a separate library under PyTorch Labs and you can check it out. It’s out there. Now, the second component of modularity that I was talking about is the fact that you would like to have those modules and be able to use just one module without using the whole stack of components of Torcharell. And the way we go about that is that we try to basically foresee before users ask for it to have those modules that are easily used across, let’s say, experience of the users. And for instance, here I have a loss that is a DQN loss. And these DQN loss, you can use it with a Tensodict if you want to use all the features that it provides. But also, a basic user that has very little experience in RL could just use it with regular Tensors. And this module will take care of creating all the Tensodicts, et cetera, under the hood for you to execute the inner operations. Same thing goes with repliBuffers that are high-optimized for Tensodicts, but basic users that don’t want to benefit from all those features, but just one, the basic usage of the repliBuffer can use it with Tensors or even other Python objects. I will quickly skim over the features that Torcharell is already providing. So we have an environment API that covers gym, deep mind control, habitat, Brax, and other libraries. And on top of those environments, you can use transforms. And those transforms can be used on batch data, which makes them much faster, but also be used on device. We have vectorized environments, and that can be executed on multiple processes and also on distributed settings. We provide modules that have, like, command architectures that are used in RL and things like that. For data collection, we have data collectors that work in a synchronous or asynchronous manner on a single machine or multiple nodes. We have a good repurposed for API that can store data on physical storage and allows you to basically store terabytes of data when you’re doing your experiments. We have some objectives like SAC, DDPG, PPO, and many others that are ready to use out of the box. We have a trainer API that provides also a checkpointing mechanism such that you can basically be full ToriRound or have, like, restart your training once you have stopped it. We integrate various logger, such as 1DB, TensorFlow board, and basic CSV logging, things like that. We have various utilities. And finally, one thing I would like to point is that we have a good set of examples and tutorials in the library, and we have quite a decent documentation about how the library should be used. So feel free to check it out. I have the tooling here for ToriRound and TensorFlow. So yeah. Thank you.

AI video(s) you might be interested in …