Devoxx presentation – OpenAI Whisper… GPT-4? (short clip from 'What's in my AI?' presentation)
This is the next piece. Consider that today there are 800 million videos on YouTube. The average length of each of those videos is about 12 minutes. The average human speaks at 150 words per minute. If we take those 1700 words, multiply that by 800 million videos, we may get 1.4 trillion words of text data. We’ve not drilled into tokens yet, but essentially tokens allow us to use more words more efficiently. Let’s have a quick look at tokens just for a moment. Rather than store all the million plus words that we’ve got in the English language, when they’ve got their data sets, they do have to break it into tokens. And tokens look a lot like this. Catalyst breaks up into the word cat-alist. If I say my cat is catatonic, it breaks up into those tokens. The words cat and the start of catatonic, even though the color differently, are actually the same tokens there. 37.97. When we apply that to the current models, we say that a word is 0.75 tokens. So if we generate our 0.1.4 trillion words of text data on YouTube today, we might get something more like 2 trillion tokens, which is what we’re probably aiming for for the next big models. But YouTube is constantly refreshed. They talk about having 3.7 million new videos uploaded every day. I think Devox has helped with that in the last 24 hours. That’s 6.5 billion new words a day. And if we tokenize that, we’re looking at 8.6 billion words per day. If it’s 3.3 billion, it should have been trained on 3 trillion tokens because they were not compute optimal in the way they did things. And DeepMontchinshila essentially retort how we should be doing that with the Chinshila scaling. But if we’re aiming for 3 trillion tokens, and we can use the words that are being spoken and have been spoken on YouTube, that might be a cool way to do things. So very, very recently, within the last couple of weeks, open AI came up with a transformer based model called Whisper. And this thing is ridiculous. How many people have used otter AI or something similar where you can speak and it will transcribe your audio file for you? We’ve come a long way since Dragon Naturally speaking. This is able to do different languages. It’s able to do punctuation and it’s able to do really thick and strange accents. It’s built into the playground. So you can use this right now for free. But if I start talking at this, this is a look at open AI, Whisper. If I start talking at this, it will be very, very accurate with its transcription of what I’m saying, including the punctuation there. If I put on a pretty nasty Australian Steve Irwin accent and say, crikey, it’ll even be pretty good with that. You can go and use this for free right now. It’s the microphone up the top here. And that particular button allows us to click it. And anything we say there will be transcribed in real time. And you can actually put this straight into your prompt. You’ll notice that it also includes punctuation. It includes very thick accents. There are examples of it translating from a very, very thick Scottish accent. I believe if I give it and my Chinese is not perfect, but I believe if I ask it, it will even convert that into English for us. Imagine running this across the 800 million YouTube video baseline that’s now probably towards 1 billion YouTube videos. And this I believe is what open AI’s Whisper is setting out to do. Setting out to be available to build these massive, massive data sets. If they’re aiming for 2 trillion tokens and they only got to 300 billion in that first data set, they’ve got a long way to go. Recommend that you play around with this particular text of speech based on the transform model. It’s a lot of fun. Did you see the memo about this? Yeah. Yeah. Yeah. I have the memo right here. Love artificial intelligence? Excited by the explosive progress of integrated AI? I am. Join my private mailing list. The memo. Did you get that memo? Yeah, I got the memo. I get priority access to my articles, videos and behind the scenes tips as soon as they’re released with a monthly or annual subscription. Yeah. Didn’t you get that memo? Lifearchitect.ai slash memo. I have the memo.