1 ML NEWS

Rumors of GPT-4 are in the air, neuron transmissions is now solved in closed form, and mind reading is a thing now. It’s Monday and welcome to ML News. Hello and welcome to ML News. This is your regular update of what’s going on in the machine learning and AI world. Our first story is the most interesting one. Brain reading is more and more becoming a thing. There is a paper called Seeing Beyond the Brain, conditional diffusion models with Sparse Masked Modeling for Vision decoding. In this paper the authors give a visual stimulus to a subject, a real human, and then look at their brain waves. This is non-invasive, this is FMRI brain scans, and from that reading of the FMRI they’re able to decode what the person is seeing. You can see right here on the top you have visual stimuli, and on the bottom you have the reconstructed images. Now what you’ll be able to see is that the pixels don’t exactly match, however the semantic content is very often the same. Now this is done via aligning the latent spaces of the encoders for the brain data and encoders from images. And this has been a long-standing problem because the training data that exists to map what people are seeing from their brain waves to the image space is just super sparse. But the authors here go around that by pre-training on unlabeled FMRI data and first get a very, very good auto encoder on that data going. Then the latent space can be determined, compressed, and then from that latent space we can learn a conditional image diffusion decoder in order to map the visual stimuli to the encoding of the brain waves. So the paradigm that we see in deep learning where you want to do some unsupervised pre-training first because you have much more unlabeled data. And only then include the task specific data and learn that on top of the unsupervised pre-trained models also holds in the field of brain computer interfaces apparently. So it’s pretty cool that we’re more and more getting the chance to peek into people’s brains. Now this isn’t yet a full thought reader or anything like this. Essentially they disambiguate between I believe some hundred different classes of labels, but it’s still very, very cool that you can essentially reconstruct just from reading brain waves. What kind of image the person is seeing and what about is in the image. In a related article, neurosciencenews.com writes that brain machine interface device predicts internal speech. Now this is a little bit different in that it’s actually invasive, so this is an interface directly to the brain, but it is able to predict internal speech, which means speech that you just internally think to yourself. It is able to decode that. Now it is not able to decode arbitrary speech. I believe they go up to about eight words or something like this. So it’s not yet exactly super accurate, but we are making big, big progress in that front. Alright, next news. Ramin Hassani writes that they’ve published a new article in Nature Machine Intelligence and solved a differential equation that’s been long standing without a closed form solution. We now have that closed form solution and it concerns the interactions between neurons. This is a major benefit for people who want to implement biologically inspired sort of biologically plausible neural networks. Because previously you’d have to have some sort of an ODE solver in order to even model that connection properly. And now that there is a closed form solution, you can essentially just forward and backprop through that formula. And the absolute coolest thing is that they have implemented this in both PyTorch and TensorFlow. So you can technically build in this directly into your architectures today. And it’s not guaranteed to be like a lot better than what we currently have in terms of neuron, neuron connection. But that’s not the point. The point is to get to a place where we can simulate biologically plausible neural networks as well as possible. And from those, potentially learn something about the brain. And we might actually get some inspiration for how to even improve our artificial neural network architectures from this. So check out the paper and the repository in case you’re interested. Alberto Romero on Substack has an article called GPT4 Rumors from Silicon Valley. This is a summary of things that people, whatever people means, talk about currently around GPT4. So OpenAI has been announcing like tiny bits of the next iteration of their language models here and there. And there used to be an interview by Sam Altman where he said, GPT4 isn’t really going to be that much bigger than GPT3. And it’s probably still going to be in the text domain. It’s probably going to be a bit more aligned to humans, a bit more learning from human feedback and so on. And people were kind of like a tiny bit disappointed, I guess, because it’s not all we’re going to build the next giant thing. But now more and more rumors are coming out that in fact, GPT4 might be very well what they claim colossal. So another scale up of two orders of magnitude or something like this in terms of numbers of parameters are even three orders of magnitude. Although some rumors claim that it is going to be sparse, so there’s not really like a one-to-one comparison. On the other hand, there are also a lot of rumors that claim that GPT4 is going to be multimodal after all. So including text, images, videos and so on. Basically anything they can get their fingers on. But we’ll see which one of these turns out to be true. It’s very well possible that they first aim that just sort of improving GPT3 and then all of a sudden with recent developments around diffusion models and so on. They’ve now gone into the direction of, you know, let’s just do another giant leap. And from people who have apparently spoken to other people who have apparently tried the new model or a precursor to the new GPT4, they say that GPT4 will be just as much an improvement over GPT3 as GPT3 was over GPT2. And if you remember, in case you remember GPT3 was a giant improvement over GPT2. Now is this going to be AGI and solve all our problems, probably not. But in case this is true, in case it is really the same amount of step from GPT2 to GPT3 as it is from GPT3 to the new GPT4. And I think we’re in for pretty, pretty amazing times. In any case, rumors be rumors and I guess we’ll only know when we actually see it. The new model is rumored to be released sometimes between December and February, so the wait isn’t going to be that long. Now related to this, OpenAI is also rumored to collaborate with Sarah Bruss. And Sarah Bruss in turn has just released their biggest super computer to date, which is called Andromeda. It has 13.5 million cores. Now Sarah Bruss is a company that builds extremely large chips. They want to do as much as they can like on a single chip, and that’s why their chips are like… I think they’re about yay big. I’m not exactly sure. But this absolute super computer is just comprised of 16 Sarah Bruss CS2 systems. So that should give you an already an indication of just how big their individual systems already are. Now connecting them makes for a giant, enormous supercomputer. Now here on the website it says Get Demo. But I guess for most of you it’s not really going to be an option to go into business with this kind of scale. But for some of you it might be and you might very well want to click that button. The meta research blog announces the ESM Metagenomic Atlas, the first view of the dark matter of the protein universe. So a lot of folding work, a lot of protein folding work has been unreasonably with alpha fold and ESM fold. And now meta releases a database of what’s called meta genomics. Metagenomics is essentially if you just go outside and you pick up a piece of dirt, there’s going to be like a ton of microbes, a ton of bacteria, a ton of organic material in there. And all of that genomic material isn’t necessarily something you find in like the human genome project or something like this. Yet it’s still very important for example for ecology, for medicine, but also for human well-being. So this meta genomic Atlas is the first database that reveals the structures of the meta genomic world at the scale of hundreds of millions of proteins. You can explore that there is a link to the Atlas right here. If you’re anywhere near this world of protein folding, I guess this is a very exciting time. And I’m also excited for the progress we make on other frontiers rather than just scaling up and producing more stories about unicorns. Like for all the criticisms that these big models get and the pressure to just scale and scale and scale, they do every now and then deliver us something like this, something that’s absolutely unknowably useful for some natural science out there. And as we get better with our core research, even if that’s on pictures of cats, I strongly believe that this will greatly benefit adjacent fields such as biology, mathematics, physics, chemistry, and more of the other sciences. Also on the meta AI blog, they released a blog post called Teaching AI Advanced Mathematical Reasoning. Now I’ve dealt with some of the papers that meta had in this regard where they tried to come up with systems that use a prover. So there are these things called prover systems or proof assistants or essentially formalize your whole mathematics inputs to spell out everything super formally, super descriptive, super detailed. And then you can use the system to search for new proofs by applying some proof strategies here and there. So you can say, I want to do now a contra position of two things and so on. However, as you’ll quickly discover the amount of strategies that you can apply to a given statement to search for a proof is really, really huge. And that leaves you essentially with a search problem. So this paper uses essentially a variant of Monte Carlo 3 search, the same thing that like AlphaGo uses in order to determine the next moves in a Go game, in order to determine the next proof strategy or the next proof step that should be applied in order to reach a given target statement. Again, very cool that what initially dealt with a bunch of games and was really flashy because we can also go and chess much better has developed into something that is of actual use in an adjacent field. In this case, mathematics. So very cool, check out the paper if you are interested. Video has released a paper called EDIFI, a text to image diffusion models with ensemble of expert denoysers. This is, I would say, a typical Nvidia paper where they don’t reinvent the world, but what they do is they take what exists and they apply a strong engineering mindset to it. They improve upon it and it just results in a very high qualitative output. So in this case, they take the idea of these text to image diffusion models, but then on top of that, they have an ensemble of expert denoysers. So they don’t just have one denoiser like we used to in a diffusion model, they have an ensemble of denoysers, which means that different models can take care of different phases in this denoising process. Also, they stage the image production in multiple steps. Now this has been done before, but it is a very viable strategy in that you essentially have one model produce a low resolution version of the image and then you successfully scale that up. Now, as you can see right here, all in all, that results in super high quality images that can either be done from the text description or from, as you can see, right here, text plus some kind of map or some kind of mask that you draw. Over here, you can also input some sort of a style reference image into this system. So again, it’s just amazing how people are able to push forward the state of the arc in such a short time. Big Science has released two new models, one called Blooms and the other one called MT-0. These are evolutions of their previous models and they’re mainly concerned with multi-task, prompted fine tuning. We’ve dealt with prompted fine tuning before in the Galactic Cap paper, which essentially means that after you retrained your model, you fine-tune it on prompted samples. So like you would ask GPT-3 with a prompt to do some kind of task, you go ahead and actually fine-tune on the prompt, the input and the output of that task to make the model learn to respond to such prompts in an appropriate fashion. And if you do that for multiple tasks, you also have the ability to then generalize to new tasks because that will carry over from the pre-training. Specifically, these new models deal with this exact setting, but in non-English data. So across-lingual generalization, doing this in multiple languages and potentially also generalizing across languages. The models are on hogging face if you want to check them out. I clear 2020-23 reviews are out on open review and there are quite a few surprises in the negative direction. So Robert Tang here tweets out an example where the authors respond to a reviewer with response to you is a waste of time. I hope you can respect the authors work and give constructive comments instead of taking a few minutes to give a trivial suggestion. I recommend that you complete a university, maybe kindergarten course, before giving your review comments. That’s just lovely. Somehow, believing in the good of human beings, maybe this person just like had an absolutely terrible day and they really need this paper and the review is actually very, very bad. Like, actually does make like a super trivial dunk on the paper and you know, I’m not sure what happened right here. If you were ever inclined to write a rebuttal like this, just don’t just sleep, go to sleep, wake up the next day, breathe and realize that it’s kinda useless even if it’s probably true. Another worrying issue tweeted out by Stella Bearderman is the following. So one reviewer criticized this model for that it is not acceptable to only compare with publicly available models, meaning that the paper should also have compared with non-publicly available models. Now there is of course a debate to have right here in order to properly compare to someone’s model you need to have access to it. The other hand, there has been a long history of science where people just hadn’t been putting stuff out into open source and you essentially just have to take the numbers from the tables from their paper and then put those into your paper and essentially just believe what they said. It’s possible that the reviewer here is of the stands that look, you know, you can just take the number that they claim and put them there. It’s also entirely fair to say that, well, I don’t have access to their model, I can’t verify their numbers and therefore I’m not gonna put them into my paper. The crocs is obviously if that fact that you leave these things away that aren’t public also makes your method appear a lot better in comparison because the only actual competitors to your method are closed source and only have some number in some paper. I don’t know what’s the correct answer right here but it’s certainly worth having a discussion about. And lastly, and you might actually have heard of this one is this paper called Variance Reduction is an antidote to Byzantines, federates, weaker assumptions and communication compression as a cherry on the top. People do get creative with titles these days but the problem that one reviewer here had is with the word Byzantines which the reviewer claimed to be disparaging of the whoever people consider themselves Byzantine. Byzantine is a term that’s been long used in various fields of analysis, security, cryptography, public game theory, so the term is very well known and isn’t established technical term. However, the reviewer is of strong opinion that there is a term that contains prejudice and is derogatory and is denouncing the ethno religious practice of some people. Now the reviewer bases their opinion strongly on the fact that the iClear code of ethics says you must respect cultural heritage of others and repeatedly claims that the usage of the term Byzantine in this work is a violation of the iClear code of ethics, whereas the authors claim this is a technical term, it’s been used for a long time and it is disparaging to absolutely no one. The conversation goes on and on, I believe there are over 36 comments in this thread including some other people coming in and saying hey I’m actually considered Byzantine and I don’t have a problem with the term, so don’t defend us. While the reviewer did make some suggestions for other terms such as deviant, but the authors pointed out that none of these suggestions capture the term in its full existence or in how people actually use it. As the debate goes on you’ll see the reviewer shifting their stance a little bit from the fact that it’s just not appropriate to use the term that the paper also isn’t technically correct. But I strongly believe that the reviewer is only introduced at that point after the discussion had been going on for a while and they realized they needed to make another stronger case on scientific terms. Now the problem here is that on open review I believe you can’t see the modifications, so we have no idea, these comments they were all changed around, even the original comment is changed around to include some other feedback and so on. So it seems the timeline here is a little bit murky. The authors here also point out that this point, the point that the word Byzantine is inappropriate, was apparently initially the only criticism of that reviewer or the only real criticism. But the reviewer gave the paper a really low score and if you know anything about conferences, most meta reviewers just kind of look whether there is one bad score and then the paper already has very poor chances or they look at the average which would obviously be decreased strongly by one bad score. So essentially the reviewer helped the paper hostage a little bit and wanted the authors to change the wording. The authors even agreed to abbreviate the word Byzantine to biz, like the short form biz, because they just didn’t agree that any of the other terms would do the technical nature justice. The reviewer disagreed that that would actually solve the problem and essentially said that even if they were to change the term, they would now expect not only to not use that term. Also the paper to contain a discussion of why the word Byzantine is not appropriate or at least like a moral struggle of the authors are bringing this up of why this is problematic. The reviewer again repeatedly and insistently claims that it violates the iClear code of ethics and holds that as like a stick to hit the authors with code of ethics. What’s interesting is that at some point the program chairs commented on this as well, saying that the program chair committee and ethics chair have been following this thread closely. Upon preliminary investigation, the ethics chair find that the use of the B word, it’s not the B word, is it possibly emerging issue but not yet a major ethics issue that could justify rejecting research. There seems to be no widespread agreement that the B word is offensive. This discussion between reviewers and authors is still valuable to our community which raises awareness of this potentially emerging issue. We appreciate the thoughts from the reviews and they said that this is essentially now resolved by saying, you know, reviewer, you made your point but we don’t agree with the point. The reviewer responded again, lengthily pointed out that this violates the iClear code of ethics. Now in the end, you could say it’s all good and the program chairs came in and essentially squashed the reviewer and said, okay, the paper is fine, can use the word Byzantine, it’s not problematic, all good. But I strongly actually believe that this is a big win for this reviewer right here because the ethics chair, the appropriate response would be, shut up, you’re an embarrassment to the scientific institution. And you’re barred from reviewing any more papers for any other conferences. This is a joke, shut up. But they didn’t do that. They essentially said yes to the reviewer, they essentially said yes, it’s a possibly emerging issue because they’ve seen that there was quite a bit of upper or in the community. That’s a, what is essentially a technical term that is no one, absolutely no one except this reviewer feels is not appropriate was used. The ethics chair said yes, it’s possibly emerging. So this is like a groundwork for the future. This is how these things slip in there. I have full conviction that people who write these codes of ethics do so with the best intentions at least most of them. You believe some of them predict exactly this and this is how you again and again slip these things in. So one person makes a fuss, you take the temperature of the community, it’s like, ah, not yet ready. But we have now precedence. So at the next conference, the same reviewer can make a fuss again and they can point back and say, well, other people, you don’t know it’s the same reviewer, other people have said this before. Actually, this might actually be problematic and the ethics chair here seems to be bound by the fact that someone said this is ridiculous, shut up. However, they do so in the most lenient way, in the most way that guarantees that in the future this will actually become a problem. So in my opinion, big win for the reviewer right here, big win for the complainers and I don’t like it. Google is a new paper called efficiently scaling transformer inference on how they scale their big home models on TPUs. Now it is not going to be very applicable for most of you, but in case you care on how they enable something like 30 to larger context lengths and super duper flops and super duper hardware utilization during large batch processing, give this paper a read. Also from Google, the Google Research blog has an entry called infinite nature generating 3D flythroughs from still photos. This is on top of a paper that they published at ECCV, which generates infinite views or infinite, you have flythroughs as the title says. And the cool thing is this happens from still images, so you can give a single image and it will generate a flythrough from that image. They use various techniques for that, but the base idea is that you take an image and you predict its depth map, so how far away all the stuff is. And then you use that in order to render the image from a slightly different view. If you know how far away all the things are, you can position your camera slightly differently and you can still determine where the pixels go. Now this will leave some pixels to be undetermined because you can now see behind things that you didn’t see before, and then you have another model here in this refined step that essentially fills in these missing pixels. And then you repeat, again, you pose the depth map, you adjust your camera position tiny bit, and then you fill in the pixels that are missing. In order to train this, it’s not exactly super easy, but there are some various techniques called cycle consistency, or what they do right here, they have an adversarial setup, they have a discriminator to determine whether after a number of steps the image still looks like it’s been generated from a real like nature image. And if you back propagate that error, then you can generate very long, very high quality flythroughs through nature. Here you can see a bunch of examples, what I do find interesting is that they also added a specific sky model. In order to make you feel like the sky is more real, I suspect their original works that the sky was often the problem and looked unrealistic. So now everything that sky here is produced actually by a separate model as far as I can tell. Haaya, I hope that’s how you pronounce it, is a new paper that also does text to image. However, this one is speed optimized. So in order to do diffusion, you have to take some bit of noise and then run it through the diffusion process, step after step after step. There are various techniques to speed this up and Paella super charges them and manages to do the whole diffusion process in only 10 steps, which amounts to only 500 milliseconds. So within only 500 milliseconds you have a high quality image from a given piece of text. Again, amazing progress in a field that is super young. Check out Paella, there is corresponding paper to it called fast text conditional discrete denoising on vector quantized latent spaces. Now if you enjoyed the previous paper on how to scale up POM, then you might also enjoy Multi-Ray, which is by Meta. And the blog post is called optimizing efficiency for large scale AI models. This describes the system called Multi-Ray. I’ve read the blog post and I have to say it’s kinda wishy-washy, you have to guess a lot of the stuff. They just kinda describe in words what it does and they link to various things that they’ve done. But I can’t exactly read out what precisely they’re doing right here. But if you need some inspiration of how a system like this would work or some hints of how this is really done in practice at scale, then give this blog post read. Archive pairs up with hugging face. So previously hugging face has acquired hugging face spaces from Gradio, which allows you to make little demos out of your hugging face repositories. And now archive includes those spaces. So if you upload a paper to archive, you can attach a demo from a hugging face space. So people can directly on archive try out your model if you have one or your technique if you have one. And do so interactively. This is very cool and obviously I’m a big fan of integrating interactive things into our very old format of 8-page PDFs. Okay, we got a bunch of new models this week. The first one is Alt-Defusions by Flag AI, which is a multi-lingual diffusion model. So this is essentially stable diffusion, but multi-lingual as you can see right here. Next is Chinese, Spanish, French, Russian, Japanese, Korean, Arabic and Italian. Next is D-Mox by Meta, which is a music source separation model. So this thing you can put like a song in there and it will separate the sources, meaning it will separate things like drums and vocals and isolate those. Perfect for practicing something, doing karaoke and whatever you want to do with it. The paper is called Hybrid Transformers for Music Source Separation and it’s an archive. There’s a new multi-lingual clip available from Lyon, trained on their own dataset, a Lyon 5B. And it reaches 77% zero shot on ImageNet in English and around 55% for Italian, Japanese and Chinese and supports over 100 languages. The cool thing is that it’s very efficient in training because it uses locked image tuning, which we’ve discussed previously in a video. So check out the model and check out locked image tuning if you haven’t seen it yet. It is really cool paper and a cool and simple technique. In other news, a research group at the Citizens University of New York has released a model that can accurately predict the human response to novel drug compounds. Now, they’re certainly not the first people to release such a model. This has obviously been going on for as long as data science has existed, but also it’s cool to see that even in this front, the drug discovery front giant progress is being made on the back of what started out as cat image research. Alright, some helpful things for this week. We have quite a lot to get through, so let’s get into it. This is a pixel art sprite sheet generator. If you’re into old games, into sprite animations and so on, this is a stable diffusion based model that will create the sprites for you, given a description. Look at this, I typed in fat Joey. Prompt extend is a model that will extend your prompts. So here is an example, you type in psychedelic liquids space and it will append what it thinks that stable diffusion needs to give you what you want. So this is like a little bit of a translator between human input and whatever a very competent human using stable diffusion could do with all the modifiers, such as concept art sharp focus illustration, unreal engine and so on. There’s a new blog post on hugging face telling you how to fine tune whisper or multi-lingual ASR, but you can fine tune whisper for whatever you want. This blog post is your point of entry. Dream texture is a plugin to make blender interact with stable diffusion. So here’s a demo, person types into blender, whatever they want as a texture in terms of text and then put a being, put a boom, apply, and it’s now in the texture. Absolutely great. The YouTube channel Mutual Information has a series on reinforcement learning that I can highly recommend. They spend a lot of time on this and I hope it is helpful to anyone who’s looking to get into RL. Lovely tensors solves a problem we all have had in the past. So if I just want to print some tensor, I’m gonna get this and it’s absolutely not helpful at all. As soon as your tensors are go beyond like four or five values, it’s useless to just look at them. So all you do is you import lovely tensors, you monkey patch that stuff in and all of a sudden if you print a tensor, a non-py array, a torch tensor, whatever, it will give you the shape, the amount of elements, statistics, semines, the standard deviations and so on. This is a much, much better way to look at tensors. Now if the tensors small enough, it will actually show you the values, but as soon as it’s bigger than that, it will give you much more useful information. So here it warns you that there is infinities, there’s nams in the tensors and so on. And even here it tells you, well, this one is actually all zeros. You can still get back to the original tensor using sort of property access. Here you have verbose access that will give you the values even if it’s large. And here you get the just the plain old way if you really want that. There are various helper methods around this also to show images, to show statistics, to show channels, and to show things such as different filters in a stack of convolutional filters. I’ll leave you to explore all of that yourself, but if you work with tensors a lot in an experimental sense, this is surely worth it. GPT index is a technique to build an index out of files using GPT. So this uses GPT to essentially take a bunch of files and then for example, recursively summarize them so that you essentially have a structure where you have a summary on top of a bunch of stuff. And then if you like one of them, you go into it and then you have summaries of the sub stuff that is there, you go into that. It’s kind of an experimental. I want to say this is a bit of a new way of thinking about what we could do with these models in order to organize information now that we have generative capabilities. And I like that people think out of the box. So if you’re also interested, check out this repository. There’s a new upscaler for stable diffusion made by Riverside Wings, the notebook is by N Shepherd, and compute has been sponsored by Stability AI. The notebook here runs you through the whole process of upsampling and it gives really cool results. I’ve previously talked about DAG’s Hub. DAG’s Hub is like a bit of GitHub for machine learning. And I know a lot of places claim this nowadays, but DAG’s Hub really believes in the open source paradigm. And now they release something they call direct data access. And essentially a technique to stream down and upload version data to some place. So it essentially connects a DVC, which you might know as like a data versioning tool with a transparent approach where you don’t need to like pull all the whole data once or you know, stream it in some custom way. You can just treat it as if it already existed and magically the library in the background will pull down the data as you need it in a streamed fashion. So no long waiting on data to arrive. You can just simply like go train and even if you don’t have space for the whole data will still work. Now I don’t have exactly time here to explain you all of the things that you can do with it, but the install is really simple. You essentially install their hooks and everything works just transparently and magically. So if you’re interested, check it out and also check out their blog. It’s regularly updated. For example, here is how to build an end to end active learning pipeline with fully open tools. GN is a GPU environment management tool. Let’s you easily control configure and monitor the GPU resources that you are using. And it is intended to ease up the process of GPU allocation for date scientists without code changes. So this is in case you’re in some lab and you share GPUs with others. This tool is a must have. I wish that this had existed during my PhD. It manages local GPUs, remote GPUs, cluster GPUs and so on. You can reserve GPUs, free up GPUs, essentially whatever you want to do. It has even a VS code plugin. So if you’re at all using GPUs and especially if you’re sharing them, consider this tool. MbXp is a multilingual benchmark for code completion in 10 plus programming languages. TSAI is an open source package intended for applying deep learning to time series on top of PyTorch and FastAI. Colossal AI has released two blog posts both pertained to better and faster and cheaper training of models. The first one is what they call AIGC AI generated content which essentially means image generation models. And the second one is for structure prediction of protein monomers and multimers. And both times they’re able to speed up these models by a lot. Now the code is openly available so do go and check it out. And the performance gains here are not only during inference like we saw before. But this in fact provides for example for stable diffusion 6.5 times faster training and pre-training cost savings. So the hardware cost of fine tuning can be almost 7 times cheaper than if you were to do it in the vanilla way. Tap width is a benchmark for tracking any point in a video. Supergradients is an awesome library to build, train and fine-tune production-ready deep learning state of the art vision models. Now I’ve seen a lot of libraries that claim to just make stuff better. If you’re into vision I believe having a library that’s specific for vision. Such as semantic segmentation or bounding box prediction or even image classification. It really pays off to have a library that’s dedicated to your field. Especially if it’s something like vision where we have a lot of custom techniques that make these models just so much more efficient and better. Not only that, supergradients also provides a lot of pre-trained checkpoints. So even if you’re just into using some models, this library might be good for you. Shumai is a network connected differential library for TypeScript and JavaScript. As you can see in this demo, what you can do is you can define neural networks in TypeScript. And then you can distribute them over multiple places, over multiple machines. And you can use the await, like the BBA sync, awaits in tags from JavaScript in order to ship data to some other machines or call some function on another machine. And the library handles everything from you, from forward propagation to back propagation and training. It’s really cool and the API for this looks quite clean. Safe tensors by HuggingFace is a new format who store and load tensors safely. Previously, I showed how to use a Pytorch loading function. And Pytorch in turn uses the pickle function by Python, which executes arbitrary code. Safe tensors is supposed to alleviate that by defining a safe, fixed, and simple format to store tensors. Now, obviously the trade-off here is that you can store arbitrary things anymore. So, while I expect that a lot of architectures might switch to something like safe tensors, it is not a full solution for the problem. For better or worse, research will come up with new things, new ways of doing things. And if you constrain yourself to a particular way of doing things, then that will always not be enough. However, it’s mostly going to be enough. Velo is a learn optimizer. And the cool thing here is that it really seems to be better than, or at least on par, with very hand-tuned optimizers. You might know optimizers as stochastic gradient descent or Adam or something like this, but it is possible to learn an optimizer. So, to learn a system that controls the optimization behavior of a training run of another system. These people have taken a lot of different ML problems, a lot of different networks, have run optimization problems on them, and have essentially learned an optimizer that optimizes all of these different problems well. So, that’s what we consider a learned optimizer. And this one really seems that for many problems, especially like mainstream problems, it works really, really well out of the box. So, without you having to tune the beta-2 parameters and the learning rate and stuff like this, you just apply it in its default configuration. And it does a pretty good job. This is super important if you want to do rapid prototyping, rapid exploration of some new ideas without doing a giant grid search over all the parameters. The Merlin Data Loader is a data loader specifically for recommender systems. Recommender systems have, you know, few extra or a few special requirements. Namely, there’s often quite few data, I want to say, compared to something like an image classifier, like the data points are mostly tabular and they’re not as many. So, loading from disk and loading like hairs and stuff from disk often can become the bottleneck. So, a data loader is super important here and the Merlin Data Loader promises to be over 10 times faster over native framework data loaders. If you’re into recommender systems, try this out. Loader is an assembly language, a computational model and a distributed tool for mining programs. This topic is very far away from me, but some of you might actually be interested. So, if you’re into integer sequences, there are these online encyclopedias of integer sequences like 1, 2, 3, 4, 5, and so on. So, there’s sequences of integers and the question is always, what’s the program behind them? Like, can I come up with a piece of code that produces that integer sequence into perpetuity? And, you know, 1, 2, 3, 4, 5 is quite simple, but it gets complicated very quickly and especially to teach machines to come up with the rules behind the sequence. So, loader is a system that allows you to mine such programs. Essentially, you can run it and it will crank, crank, crank, crank and intelligently search for these programs. But not only that, it is also a distributed tool for doing that. So, you can distribute, you can partake in mining of such programs and much more. So, as I understand, this is about what a loader program looks like or what it searches for. So, here you can see one of these sequences and this is apparently the program it comes up with. It looks pretty interesting if you’re interested, check loader out. NUMGA, not NUMBA, NUMGA is a library for geometric algebra in JAX and NUMPAI. If you’re into geometric algebra, here’s the example of a rigid body physics engine with a constrained solver, then this library might be for you. MTB is a benchmark for text embedding. This is from similar authors as the buyer benchmark, which is a retrieval benchmark. But this goes further. This is a benchmark that covers 8 embedding tasks over 56 datasets and 112 languages. And it also evaluates in this paper already 33 models on that benchmark. So, the goal here is to find the one unified text embedding that covers all downstream tasks. And the status this far is that that universal embedding hasn’t been found yet. The leaderboard shows that some models are good at some tasks, other models are good at other tasks. So, the holy grail of text embedding is still somewhere out there. And this benchmark might prove that you have found it. Okay, the last cool thing I want to show you is not bought. And this is already a little bit older and not freedom tweeted this out in September. But essentially, key managed to connect GPT-3 to the browser, to a web browser. And just let it interact with the web browser by prompting it in an appropriate way, given the website’s HTML structure. So apparently the original idea comes from Sharif Shamim. And that bot has a repository on GitHub. Look, it’s just one Python file. I know half of you are super cringing right now, but, you know, research be research. And if you want to figure out how it’s done, how that bot works, and if you want to give it a shot yourself, I’d be really cool to do. So, please do. Alright, that was all from ML News. This was a big chunk. Thank you so much for being here. Thank you for supporting the channel. Come to Discord if you’re not already on it. Link is in the description. We have fantastic paper discussions every week. And we talk general machine learning every day. With that being said, stay hydrated. Bye-bye.

AI video(s) you might be interested in …