Last updated on August 10, 2020
This note accompanies a peer-reviewed paper published in the Journal of Personality and Social Psychology, entitled: “Deep neural networks are more accurate than humans at detecting sexual orientation from facial images” by Michal Kosinski and Yilun Wang.
Please direct your suggestions and questions to email@example.com
This document has three main sections:
Update: See the replication of our study which confirmed that sexual orientation can be predicted from facial images. Models were invariant to makeup, eye-wear, facial hair and head pose.
Press articles whose authors have actually bothered to read the paper:
We did not build a privacy-invading tool. We studied existing facial recognition technologies, already widely used by companies and governments, to see whether they can detect sexual orientation more accurately than humans.
We were terrified to find that they do. This presents serious risks to the privacy of LGBTQ people.
Our work is limited in many ways: we only looked at white people who self-reported to be gay or straight. We discuss those limitations at length in our paper and below. Those limitations do not, however, invalidate the findings or the core message of the study: that widely used technologies present a risk to the privacy of LGBTQ individuals.
Our work is not the first one to show that sexual orientation can be detected from the human face. It is well established that humans can, with some accuracy, detect sexual orientation from a still facial image. It has also been previously shown that computers outcompete humans at many visual tasks, including detecting sexual orientation.
We invite you to consider the evidence before dismissing it.
Note: This study is not about sexual orientation or its origins, despite some people trying to interpret it in this way.
This study was peer reviewed and published in the Journal of Personality and Social Psychology, the leading academic journal in psychology. In addition, before it was sent for a formal peer review, the manuscript was reviewed by over a dozen experts in the fields of sexuality, psychology, and artificial intelligence. The research has been approved by Stanford’s Internal Review Board.
Across seven studies, we show that a computer algorithm can accurately detect sexual orientation from people’s faces. When presented with a pair of participants, one gay and one straight, the algorithm could correctly distinguish between them 91% of the time for men and 83% of the time for women.
This is comparable with the accuracy of mammograms (85%) or modern diagnostic tools for Parkinson's disease (90%). (Also see this section.)
We trained the algorithm on a sample of over 35,000 facial images of self-identified gay and straight individuals, obtained from a publicly available database. The accuracy was verified on a subset of images that the algorithm had not seen before. We made sure that the predictions were not affected by differences in age and ethnicity.
We also tested the algorithm on an independent sample of Facebook profile pictures and achieved similar results.
In contrast, human judges were not much more accurate than random guesses. We believe that this is yet another example of artificial intelligence (AI) outperforming humans.
As the title indicates, our study aims to show that “Deep neural networks are more accurate than humans at detecting sexual orientation from facial images.”
This study is not about sexual orientation or its origins. In the process of studying the features employed by the classifier to distinguish between gay and straight faces, we noted that the former tend to be gender atypical (or, in other words, that faces of gay men tended to be slightly more feminine than those of straight men, while faces of lesbians tended to be slightly more masculine than those of straight women). This is consistent with one of the most widely accepted theories explaining the origins of sexual orientation (prenatal hormone theory).
The fact that our findings are consistent with a widely accepted theory of the origins of sexual orientation, does not prove that theory. Instead, it provides additional support to the validity of our findings.
This study was neither designed nor intended to explore the origins of sexual orientation, nor prenatal hormone theory.
No, we did not. We showed that a widely used facial recognition technology inadvertently exposes people’s sexual orientation.
Facial recognition software examines a face and turns it into a numerical representation, or a bunch of numbers. Those numbers are typically compared across different facial images to see which one of them contain the same face.
We noticed that these numbers differ in predicatble ways between gay and straight faces, allowing for the detection of sexual orientation and invasion of people’s privacy. In other words, the information about your sexual orientation is already built into the results produced by facial recognition software. It is not explicit—the software does not say “Michal is gay” but, as we show in our paper, this intimate information can be relatively easily extracted.
This might be news to you and the leaders of LBTQ community, but technology companies and governmental institutions are all well aware of the fact that sensitive traits can be easily extracted from the numbers produced by facial recognition software.
See “This must be wrong!” section below.
The fact that algorithms can predict sexual orientation from human faces has serious privacy implications. The ability to control when and to whom to reveal one’s sexual orientation is crucial not only for one’s well-being, but also for one’s safety.
In some cases, losing the privacy of one’s sexual orientation can be life-threatening. The members of the LGBTQ community still suffer physical and psychological abuse at the hands of governments, neighbors, and even their own families. The laws in many countries criminalize same-gender sexual behavior, and in some places, it is punishable by death.
The growing digitalization of our lives and rapid progress in AI continue to erode our privacy. As this and our previous studies illustrate, willingly shared digital footprints can be used to reveal intimate traits. Michal’s 2013 paper warned that algorithms can accurately reveal people’s intimate traits from their Facebook “Likes,” which were, at that time, publicly visible by default. His 2015 paper showed that algorithms can predict one’s behavior more accurately than a friend or a spouse can.
Those papers raised enough alarm to effect policy change. For example, within a few weeks of the publication of Michal’s 2013 paper, Facebook switched off the public visibility of Likes. Michal’s work was also discussed by lawmakers in the U.S. and the EU in the context of the new privacy legislation.
Our work is aimed at informing privacy policies and technology.
Unfortunately, even the best privacy-protecting laws and technologies are unlikely to guarantee privacy at all times and for everybody. One can regulate Facebook, Google, and U.S. intelligence agencies, but one cannot expect that hackers, startups with little to lose, or foreign intelligence services will also follow the rules. The digital environment is very difficult to police; data can be easily moved across borders, stolen, or recorded without users’ consent. Also, most people want some of their social media posts, blogs, or profiles to be public. Few would be willing to cover their faces while interacting with others. As this and other studies show, this is enough to invade their privacy.
Consequently, the safety of gay and other minorities hinges not on their right to privacy (which can be maliciously invaded), but on the protection of their human rights and tolerance of societies and governments. In order for the post-privacy world to be safer and hospitable, it must be inhabited by well-educated people who are radically intolerant of intolerance.
Even if we are wrong and privacy could be preserved, a tolerant world—where losing your privacy is not putting you at risk—would be a much better place.
We were really disturbed by these results and spent much time considering whether they should be made public at all. We did not want to enable the very risks that we are warning against.
Governments and corporations are already using tools aimed at revealing intimate traits from faces. Facial images of billions of people are stockpiled in digital and traditional archives, including dating platforms, photo-sharing websites, and government databases. Profile pictures on Facebook, LinkedIn, and Google Plus are public by default. CCTV cameras and smartphones can be used to take pictures of others’ faces without their permission.
We felt that there is an urgent need to make policymakers and LGBTQ communities aware of the risks that they are facing. Tech companies and government agencies are well aware of the potential of computer vision algorithm tools. We believe that people deserve to know about these risks and have the opportunity to take preventive measures.
We made sure that our work does not offer any advantage to those who may want to invade others’ privacy. We used widely available off-the-shelf tools, publicly available data, and standard methods well known to computer vision practitioners. We did not create a privacy-invading tool, but rather showed that basic and widely used methods pose serious privacy threats.
The average faces most likely to belong to gay men (see Figure 1) were more feminine, while the faces most likely to belong to lesbians were more masculine. Typically, men have larger jaws, shorter noses, and smaller foreheads. Gay men, however, tended to have narrower jaws, longer noses, larger foreheads, and less facial hair. Conversely, lesbians tended to have more masculine faces (larger jaws and smaller foreheads) than heterosexual women.
The gender atypicality of gay faces extended beyond morphology. Lesbians tended to use less eye makeup, had darker hair, and wore less revealing clothes (note the higher neckline)—indicating less feminine grooming and style. Furthermore, although women tend to smile more in general, lesbians smiled less than their heterosexual counterparts.
Additionally, consistent with the association between baseball caps and masculinity in American culture, heterosexual men and lesbians tended to wear baseball caps (see the shadow on their foreheads in Figure 1; this was also confirmed by a manual inspection of individual images).
Figure 1. Composite faces and average face outlines produced by averaging faces/outlines classified as most likely to be gay or straight.
Gender atypicality of the faces of gay men and women is consistent with a large number of previous studies. Previous findings (see the section on adult gender nonconformity in this review) showed gender atypicality in occupations, hobbies, patterns of movement (i.e., gestures and walking), speech (i.e., articulation), physical presentation (i.e., clothing choices and hairstyles), and facial appearance. Perhaps the most widely accepted theory used to account for gender atypicality is the prenatal hormone theory (PHT) of sexual orientation.
The fact that the facial features predictive of sexual orientation are consistent with the well-established theory yields support to the validity of the classifier.
Our work is intended as a warning that predictions of this kind can be made with worrying accuracy, rather than an attempt to estimate what is the maximum accuracy of such predictions. We used basic tools and images in low resolution. Those deploying such methods in practice are using much more sensitive DNN models and devices.
How accurate was the classifier in our study? Very accurate: it is comparable with the accuracy of mammograms (85%) or modern diagnostic tools for Parkinson's disease (90%).
As interpreting classification accuracy is not trivial and is often counterintuitive, let us illustrate it with few examples.
Imagine a group of 1000 men, including 70 gay men, whose faces were assessed by the the classifier with an accuracy of AUC=.91 (comparable with the one achieved in our study for males with 5 images per person).
The classifier does not tell you which person is gay, but labels each person with a probability of being gay. The non-trivial decision that you need to make now, is to decide where to set the cut-off point - or what is the probability above which you classify someone as gay.
If you wanted to select a small sample of gay men and make few mistakes - label as gay only a few cases with top probabilities. You will get a high precision (e.g., the fraction of gay people among those classified as gay), but low recall (e.g., you will ‘miss’ many gay men). If you prefer to cast a wider net - you will ‘catch’ more gay men, but also erroneously label more straight men as gay (so called “false positives”). In other words, aiming for high precision reduces recall, and vice versa.
Back to the group of 1000 men, including 70 gay men. If one selected 100 random males from this sample, only 7 are expected to be gay: a random draw offers a precision of 7% (7 out of 100 selected men were gay).
Let’s turn on the classifier. Among the 100 of individuals with the highest probability of being gay according to the classifier, 47 were gay (precision = 47/100 = 47%). In other words, the classifier provided for a nearly seven-fold improvement in precision over a random selection. There were also 53 “false positives” - straight men classified as gay. Note, however, that as there are only 70 gay men in the examined population, there would be 30 “false positives” even if the classifier was perfect.
The number of false positives could be decreased, and the precision increased, by narrowing the targeted subsample. Among 30 males with the highest probability of being gay, 23 were gay, an eleven-fold improvement in precision over a random draw (Only 2.1 men would be expected to be gay in a random subset of 30 males). Finally, among the top 10 of individuals with the highest probability of being gay, 9 were indeed gay: a thirteen-fold improvement in precision over a random draw.
There are three types of such mechanisms. First, character can influence one’s facial appearance. For example, women that scored high on extraversion early in life tend to become more attractive with age.
Second, facial appearance can alter one’s character. Good-looking people, for example, receive more positive social feedback, and thus tend to become even more extroverted.
Third, many factors affect both facial appearance and one’s traits. Those include prenatal and postnatal hormonal levels, developmental history, environmental factors, and genes. Testosterone, for instance, significantly affects both: behavior (e.g., dominance) and facial appearance (e.g., facial width and facial hair).
Our project came up by accident. We were browsing through profile images submitted by volunteers (along with their personality scores) to see if we could identify correlates of personality in the background. We were surprised to notice that we seem to be able to infer personality from the face itself. We could not believe that and started investigating this issue more closely. Soon, we realized that a simple facial recognition algorithm could do a much better job than us (and other human judges), and that it could accurately infer intimate traits ranging from political views to sexual orientation. We found it very disturbing.
We get a lot of feedback along these lines. And quite frankly, we would be delighted if our results were wrong. Humanity would have one problem less, and we could get back to writing self-help bestsellers about how power-posing makes you bolder, smiling makes you happier, and seeing pictures of eyes makes you more honest.
This study, as virtually any other study, has many limitations. We discuss some of them below. Aslo, see this great article from LGBTQ Nation.
Despite our attempts to obtain a more diverse sample, we were limited to studying white participants from the U.S.
This does not invalidate our results showing that you can distinguish between white gay and straight individuals. Naturally, it while it is possible that the same does not apply to other ethnicities, our findings suggest that this, unfortunately, is likely. Similar biological, developmental, and cultural factors—which are responsible for differences between gay and straight Whites—are likely to affect people of other ethinicities as well.
That’s true; we did not check if one can predict whether someone is bisexual from their face.
This does not invalidate the results in any way. We still show that you can distinguish between gay and straight individuals. It is possible that some of the users categorized as heterosexual or gay were, in fact, bisexual. Correcting such errors, however, would likely boost the accuracy of the classifiers examined here.
Importantly, excluding bisexual or non-binary people does not mean that we are denying their existence.
That is a serious limitation and we discuss it at length in our paper. It is reasonable to expect that the images obtained from a dating website could be especially revealing of sexual orientation; this, however, did not seem to be the case.
First, we tested our classifier on an external sample of Facebook photos. It achieved comparable accuracy as on the dating website sample, suggesting that the images from the dating website were not more revealing than Facebook profile pictures.
Second, we asked humans to judge the sexual orientation of these faces. Human accuracy was no better than in the past studies where humans judged sexual orientation from carefully standardized images taken in the lab. This shows that the images used here were not especially revealing of sexual orientation—at least, not to humans.
Finally, the deep neural network used here was specifically trained to focus on fixed facial features that cannot be easily altered, such as the shape of facial elements. This helped in reducing the risk of the classifier discovering some superficial and not face-related differences between facial images of gay and straight people used in this study.
Unfortunately, this belief is not supported by evidence. Quite the contrary, many studies have shown that people can determine others’ political views, personality, sexual orientation, honesty, and many other traits from their faces. Also, humans’ low accuracy when judging such traits does not necessarily mean that those traits are not prominently displayed on the face. Instead, people may have a limited ability to detect or interpret the cues - a limitation that does not necessarily apply to algorithms.
Well, it seems that physiognomists were at least partially correct, as we are all 100% ape.
Without a doubt, physiognomy was based on unscientific studies, superstition, anecdotal evidence, and racist pseudo-theories. The fact that its claims were unsupported however, does not automatically mean that they are all wrong. Some of physiognomists’ claims may have been correct, perhaps by a mere accident.
Physiognomists were clearly wrong when they claimed that they could accurately judge character based on facial appearance. Modern scientific studies have shown that we are not very accurate at this task. The same studies however, consistently show that we are better than chance, revealing that faces contain at least some information about one’s character.
Thus, physiognomists’ main claim—that the character is to some extent displayed on one’s face—is supported by modern science.
That’s something we thought about a lot, and we hope that future studies will help to prove or disprove the predictability of sexual orientation from human faces. We have, however, put much effort into controlling this issue.
First, our models were specifically trained to focus on fixed facial features that cannot be easily altered, such as the shape of facial elements. The deep neural network used here was trained for a completely different task: recognizing the same person across images. This helped us to reduce the risk of the classifier discovering some superficial and not face-related differences between facial images of gay and straight people used in this study.
Second, we validated the findings on an external sample.
Third, we investigated what elements of the facial image were predictive of sexual orientation to ascertain that it was, in fact, facial features (and not other factors). As you can read in the paper, even if all of the visual information is removed, the classifier can still be quite accurate based on merely the outline of the face.
Fourth, we revealed only the facial area to the classifier, and removed the background of the images. We also checked that the classifier focused on facial features and not the background while making the prediction. Heatmaps below (taken from Figure 3 in the paper) clearly show that the classifier focused on facial areas (red) and ignored the background (blue)
Finally, and perhaps most importantly, the differences between gay and straight faces picked up by the classifier are consistent with and predicted by the prenatal hormone theory—the most widely accepted theory explaining the origins of sexual orientation.
We also know many very masculine gay men and very feminine gay women. We also know many very old men, which does not invalidate the statement that women tend to live longer. The fact that the faces of gay men are more feminine on average (as they tended to be in our study) does not imply that all gay men are more feminine than all heterosexual men, or that there are no gay men with very masculine faces (and vice versa for lesbians).
The differences in femininity/masculinity observed in this study were subtle and spread across many facial features: enough to be apparent to a sensitive algorithm, but imperceptible to humans.
Also, please read the “The causes of sexual orientation: An interim summary” in this review article.
It is certainly possible that some of the participants who told us that they were straight were, in fact, gay (or vice versa). We believe, however, that people voluntarily posting and seeking partners on dating websites have little incentive to lie about their sexual orientation.
Also, if some of our participants were, in fact, mislabeled, correcting such errors would most likely further increase the classification accuracy.
We could be easily convinced that gay men (our gay male friends for sure!) have better hairstyles and facial hairstyles, and take better pictures. As we discuss in our paper, gay and straight faces do differ in terms of grooming. However, they also seem to differ in terms of morphology. Facial contour alone provided for an accuracy of over 70% among men and above 60% among women.
Even if the differences between gay and straight faces are exclusively due to differences in grooming, lifestyle, or fashion (i.e., nurture), this does not necessarily reduce the privacy threats faced by gay men and women. Many of the grooming or fashion choices are made unconsciously; removing other revealing features might require changing someone’s lifestyle.
 According to the PHT, same-gender sexual orientation stems from the underexposure of male fetuses and overexposure of female fetuses to androgens that are responsible for sexual differentiation. As the same androgens are responsible for the sexual dimorphism of the face and the brain, the PHT predicts that gay people will tend to have gender-atypical facial morphology and gender atypical preferences (including gender atypical sexual preferences).