Skin tone and computer vision at Google
Hi, everyone. I’m RL. I’m a PM on Google Research working in RRI and machine perception, particularly perception fairness. And today I’m going to be chatting with you guys about the history of skin tone and computer vision and how Google is now moving needle forward with skin tone detection. So as you can see here, we have a pretty full agenda today. Mainly, we’re going to be working on you know, talking about the history of skin tone in big tech and you know, they’re going and talking about how Google and Dr. Monk work together to push forward the monk skin tone scale and then last but not least, talking about how you as a practitioner can use the monk skin tone scale. Let’s dive into it. So skin tone plays an extremely important role in how we’re treated in the world and the way we experience the world and you know, interact with technologies. We know many examples across industry where practitioners have fallen short in the past including ourselves and what products have not worked very well for darker skin tones. For example, in this slide, you see here, you’ll see a picture that a UC Berkeley professor took using our cloud API. Notice here, the skin tone of the darker student in the class was not detected and thus he was not detected as an object. This is obviously problematic especially because these disparities, you know, can reflect and further reinforce systematic inequities across our society and further societies. And oftentimes, in other societies where you know, skin tone is often correlated with your life outcomes and your chances on success, this can be like further exacerbated in terms of harms. So really important work. At the time, you know, of that product that we were using, you know, the de facto tech industry standards for categorizing skin tone, also known as the six point Fitzpatrick scale. Developed in 1975 by Harvard dermatologist Thomas Fitzpatrick, the Fitzpatrick scale was originally designed to assess UV sensitivity of different skin types for dermatological purposes. So as a result, because you know, the scale was basically used for UV sensitivity research, the scale tends to, you know, tend towards lighter skin tones, which also tend to be more UV sensitive, makes sense. While the scale may work for dermatological use cases, though, relying on the Fitzpatrick scale from machine learning development has resulted in obvious unintended bias that excludes darker skin tones. So we realized, you know, at the time that we’ve been missing really important opportunities to achieve more, you know, ambitious fairness objectives and following short, honestly, on our commitment to human-sittered and inclusive design. So as we looked to improve the tools that know you were developing each year, I’m in rely on for fairness research internally. We merged with UXRs like Courtney Heldraath to be able to help us understand what we could do doing better than these six dots. However, this area is extremely complex. You know, there are a lot of different ways that you can measure skin tone. For example, you can use a colorometer to take exact color, you know, values of a person, but most importantly, you need to ask yourself, is that the right way to measure skin tone? Do we want this, you know, scientific objective measure of skin tone or are more in a place where we want to measure, you know, the more social and reflective pieces of how people experience and are treated in the world and honestly see themselves reflected in the community they exist amongst where skin tone plays a large role. So because it’s such a profound and sociotechnological concept, it’s really hard to strike a balance between the two. So when we, you know, set up to this research, we really wanted to make sure that we centered our, you know, hypothesis on finding what does it mean to like to feel represented by a scale? What does it mean to, you know, see yourself and see others that you know and love be reflected in the technology that you use? So we asked ourselves, is there a skin tone measure that could accomplish such a profound thing? And I’ll tell you where we got. So basically, this is why I mean my team exists, honestly. You know, we’re a bunch of cross-functional specialists. UXR, colorism experts, research scientists, engineers all coming together to leveraging a sociotechnical approach to forming our own, you know, research questions of developing hypothesis and ultimately trying to push this problem space forward. Our vision is to truly help Google build products that work well for every person of every skin tone. As a first step, we conducted an extensive literature review to understand the state of existing skin tone measures that are being used in our world. More specifically, we wanted to learn more about, you know, both the objective and subjective measures of skin tone and also the limitations of these measures and current applications that these scales have in ML systems. So in order to validate the Munk skin tone scale for machine learning purposes, researchers working on the skin tone fairness team launched a study in the US that is now under peer review and study participants found that the Munk skin tone scale was not only more inclusive than like the status quo. They also found that it was as inclusive as one of the largest, you know, being cosmetic brands known for their makeup and inclusivity. In summary, you know, the Munk skin tone scale transformed the continuous concept of skin tone spectrum into 10 tones that was enough granularity to reflect diverse, you know, communities that exist around our world, but not have too much complexity when it comes to enabling machine learning training and evaluation. And we worked with Dr. Ellis Munk to do further research and decided to push the envelope and share this tool not only internally in Google, but also with the rest of the world. We share this at IO and next slide. Also externally, where you can find information on skin tone.google where that basically is pretty similar to all the stuff I’ll talk about today. And last but not least, obviously, we’ve been, you know, using it to make our products better inside of Google. Last thing I want to chat with you guys about for the last couple of minutes are some recommended practices when using the Munk skin tone scale. So let’s dive into those. A couple slides next, please. Although you could argue that, you know, there are many rules that should be considered. And, you know, there’s more nuance than, you know, is obviously on this slide. When me and my team kind of sat down and tried to boil this down into the core principles, we really wanted to make sure that we encourage people to do just a few important things if they’ve never remembered nothing else. So I’ll walk you through those now. You can also find these on skin tone.google slash recommended practices as well if you want to follow along there or find it offline. Principle number one, do practice inclusive product development and testing. So teams within Google have obviously been using this tool and incorporating it in their fairness analysis testing. We are really pushing for people to make sure that they have their training and evaluation data sets include representation along the scale. Same thing for, you know, evaluating just your overall products as well. And then, you know, when you are building, really do consider the intersectional slices besides skin tone, like stintone, gender age, what other subgroups that you feel like also maybe potentially harmed by your product. So another really important thing is to really remember that skin tone can be subjective. So when putting all this into practice, make sure your data sets are annotated responsibly using skin tone like labels. And we recommend people to enlist human annotators and also to understand that, you know, context of the annotator does play a role. And you can learn more about that at the Exploreable here in light. In developing the scale, one of Dr. Monks’ intentions was to help decouple, you know, skin tone and race. This is an extremely important concept. And, you know, these two things are often really conflated. Racial and ethnic groups include a vast, you know, spectrum of skin tones. As you can see here on this like slide. And by accounting for these differences, both within and between groups, we can move towards a more nuanced way of being able to measure and build an inclusive technology. So in summary, don’t try to use skin tone as a proxy for race. It won’t work. The last but not least, don’t disregard the forbidden AI principles. You can find those at AI.google as well. And don’t be mean. Don’t be evil. Last couple of slides. And last but not least, don’t be evil. Okay. That’s pretty much all the things I have for you guys. Once again, you can find all these principles at skin tone.google. I’m slash recommended practices. And thank you again so much for your time.