OpenAI’s Whisper is AMAZING!
Seriously, is that enough to pull this off? Jiju saw the latest hot open source model released by OpenAI transcribing this guy. It’s called Whisper and it’s really really good. Well, in reality what Jiju saw didn’t happen in real time. That was just me and this video. But I did put together some code so you can see how good this model is and try it for yourself. For context, Whisper is a speech recognition model that you can use for transcription and translation. South Florida is one of the most beautiful places in the continental United States. I’m gonna run that text through the model but we can make it even more interesting. The advance of science and the last decade is incredible. We’ll see if this model can transcribe the audio and translate it into English. So let’s bring the computer and get started. Alright, before we get started, I want you to assume that the average human has a life expectancy of 80 years and sleeps 8 hours every single day. That gives us a total of 467,000 hours where we are awake. In contrast, OpenAI used 680,000 hours of data to join Whisper. That is around 45% more listening time that we get in our entire lifetime. So no wonder Whisper is really really good. Now, the model doesn’t specialize in any particular task. But OpenAI claims that it makes around 50% fewer errors across many different sample datasets. That’s just nuts. Final thing I’ll say before I shut off and take a look at the code. Whisper is not only English, which is huge. About a third of the dataset that OpenAI used to train Whisper is non-English. So you can use the model to transcribe from a bunch of different languages. Alright, let’s look at the code which is surprisingly short. By the way, you’ll find a link to this notebook in the description below. Alright, so first, I’m going to install Whisper directly from their GitHub repo and radio. And the reason I’m using radio is to create a very simple interface where Wikin, record the audio directly from my computer and transcribe it and translate it. That interface is based on an notebook that HuggingFace created and published online. I took it, simplified it a little bit, added a couple more things, and that’s what you get here. For the model itself, OpenAI offers several options. And I copied the table from their GitHub repo. So here you have the list of different models that you can load on your computer, depending on how much memory you want to use, how fast you need the results to be, or whether or not you need multi-lingual support. Personally, I’m using the medium model, but I found that the base model, which is way smaller, works very, very well as well. So to start things off, I loaded that model and then I created a couple of functions. And I want you to notice how simple these functions are. First, there is a transcribe function that’s going to receive a file, and that’s the audio file that’s the recording. And they were going to call the transcribe function of the whisper model. Not is here, however, that I’m passing a list of options. And one of those options is the task that I want to do within my function. And in this particular case, the task is transcribed in the audio. The translate function is very simple as well, and it’s almost a match of the previous function, except here the task is going to be translated. Now, remember, right now the whisper model only supports translating into English. So you can start with any language and translating to English. Finally, this is the Gradio interface. Very, very simple interface. You can set here. It’s got a couple of buttons, one to transcribe my audio, one to translate my audio, and then you get here an area where we are going to display the text, the result of that transcription or translation. And the code is very straightforward. You get the capturing of the audio here, the component that’s going to capture that audio. Here you have a couple of buttons, the transcribe and the translate button. And notice how I’m connecting the click event on these two lines. I’m connecting the click event to the two functions that I created before. Therefore, when you click on the transcribe button, we’re going to call the transcribe function. We’re going to pass the audio and we’re going to receive the result and display it in the text box that we added to the interface. Very simple stuff. So this is everything we need for our example. I added a final cell to my notebook where I’m calling directly the transcribe and translate function. And this is useful if you want it for example to record your audio from your phone and then send it to your computer as a file. You can upload the file and then access the functions directly passing the file name. So if you want to use that, you have it there. Let’s give this a try. South Florida is one of the most beautiful places in the continental United States. All right. So that’s my audio. South Florida is one of the most beautiful places in the continental United States. Sounds good. Let’s click transcribe here and that was perfect. That was fast. That was beautiful. All right. Let’s try something else. Let’s do it now in Spanish. The advance of the science in the ultimate decade is incredible. The advance of the science in the ultimate decade is incredible. Okay. Sounds good. Let’s transcribe it. That was perfect. That was very good. And now let’s translate. It’s going to take the same audio translated into English. The progress of science in the last decade is incredible. That was amazing. Okay. So here you have it this morning. I just saw people putting together samples where they are transcribing and translating YouTube video. Music to get out the lyrics. It’s really, really cool. The community is coming together. They’re starting to build super cool things with this is open source so you can use it right away. You really need to give it a try. Remember the link to this notebook is in the description below. So how about it? Go nuts. Build something cool. And I’ll see you in the next one. No, no, no, wait, wait, wait, wait. You made it all the way here. So please like the video below. Subscribe to my channel. And now for real. So I’ll see you in the next one.