Last updated on September 28, 2017

Authors’ Note

This note accompanies a peer-reviewed paper accepted for publication in the Journal of Personality and Social Psychology, entitled:

Deep neural networks are more accurate than humans at detecting sexual orientation from facial images

by Michal Kosinski and Yilun Wang. 

The preprint can be downloaded here. 

Please direct your suggestions and questions to michalk@stanford.edu

There are three main sections of this document:

  1. Summary of the findings
  2. You must be wrong – this is pseudoscience! (common criticism of this paper)
  3. Our response to an irresponsible press release by GLAAD and HRC; or, a much better response by LGBTQ Nation

Summary of the findings

We did not build a privacy-invading tool. We studied existing facial recognition technologies, already widely used by companies and governments, to see whether they can detect sexual orientation more accurately than humans.

 

We were terrified to find that they do. This presents serious risks to the privacy of LGBTQ people.

 

Our work is limited in many ways: we only looked at white people who self-reported to be gay or straight. We discuss those limitations at length in our paper and below. Those limitations do not, however, invalidate the findings or the core message of the study: that widely used technologies present a risk to the privacy of LGBTQ individuals.

Our work is not the first one to show that sexual orientation can be detected from the human face. It is well established that humans can, with some accuracy, detect sexual orientation from a still image of human face. It has been also shown that computers outcompete humans at many visual tasks. It turns out that detecting sexual orientation is one of them.  

We invite you to consider the evidence before dismissing it.  

Note: This study is not about sexual orientation or its origins, despite people trying to interpret it in this way.

What are the main findings?

Across seven studies, we show that a computer algorithm can accurately detect sexual orientation from people’s faces. When presented with a pair of participants, one gay and one straight, the algorithm could correctly distinguish between them 91% of the time for men and 83% of the time for women.

This is comparable with the accuracy of mammograms (85%) or modern diagnostic tools for Parkinson's disease (90%). (Also see this section.)

We trained the algorithm on a sample of over 35,000 facial images of self-identified gay and straight individuals, obtained from a publicly available database. The accuracy was verified on a subset of images that the algorithm had not seen before. We made sure that the predictions were not affected by differences in age and ethnicity.

We also tested the algorithm on an independent sample of Facebook profile pictures and achieved similar results.

In contrast, human judges were not much more accurate than random guesses. We believe that this is yet another example of artificial intelligence (AI) outperforming humans.

This study was peer reviewed and accepted for publication in the Journal of Personality and Social Psychology, the leading academic journal in psychology. In addition, before it was sent for a formal peer review, the manuscript was reviewed by over a dozen experts in the fields of sexuality, psychology, and artificial intelligence. The research has been approved by the Internal Review Board.

What this study is not about

As the title indicates, our study aims to show that “Deep neural networks are more accurate than humans at detecting sexual orientation from facial images.”

This study is not about sexual orientation or its origins. In the process of studying the features employed by the classifier to distinguish between gay and straight faces, we noted that the former tend to be gender atypical. This is consistent with one of the most widely accepted theories explaining the origins of sexual orientation (prenatal hormone theory), providing an additional support to the validity of the classifier.

This study, however, was neither designed nor intended to explore the origins of sexual orientation, nor prenatal hormone theory. Please read this comprehensive review if you are interested in this subject.

So you basically built a gaydar?!

No, we did not. We showed that a widely used facial recognition technology inadvertently exposes your sexual orientation. Such software examines a face and returns a bunch of numbers. Those numbers are normally used to detect the same face across many images. We noticed that the patterns of the numbers produced by this software differ for gay and straight faces, allowing for the detection of sexual orientation and invasion of people’s privacy.

 

In other words, the information about your sexual orientation is already built into the results produced by such software. It is not explicit—the software does not say “Michal is gay.” However, it can be relatively easily extracted. To learn how to do it, one needs to apply such software to few thousand faces of known gay and straight people, and compare their results to detect the pattern.

 

This might be news to you, but a wide range of companies and other institutions are well aware of the fact that sensitive traits can be easily extracted from the numbers produced by facial recognition software.  

You must be wrongthis is pseudoscience!

We get a lot of feedback along these lines. And quite frankly, we would be delighted if our results were wrong. Humanity would have one less problem, and we could get back to writing self-help bestsellers about how power-posing makes you bolder, smiling makes you happier, and seeing pictures of eyes makes you more honest.[1] More on this in the “This must be wrong!” section below.

What are the implications of these findings for privacy?

The fact that algorithms can predict sexual orientation from human faces has serious privacy implications. The ability to control when and to whom to reveal one’s sexual orientation is crucial not only for one’s well-being, but also for one’s safety.

In some cases, losing the privacy of one’s sexual orientation can be life-threatening. The members of the LGBTQ community still suffer physical and psychological abuse at the hands of governments, neighbors, and even their own families. The laws in many countries criminalize same-gender sexual behavior, and in some places, it is punishable by death.

The growing digitalization of our lives and rapid progress in AI continue to erode our privacy. As this and our previous studies illustrate, willingly shared digital footprints can be used to reveal intimate traits. Kosinski’s 2013 paper warned that algorithms can accurately reveal people’s intimate traits from their Facebook “Likes,” which were, at that time, publicly visible by default. Kosinski’s 2015 paper showed that algorithms can predict one’s behavior more accurately than a friend or a spouse can.

Those papers raised enough alarm to effect policy change. For example, within a few weeks of the publication of the 2013 paper, Facebook switched off the public visibility of Likes. Kosinski’s work was also discussed by lawmakers in the U.S. and EU in the context of the new privacy legislation.

We hope that the current work will also help to shape the policies and technology.

Unfortunately, however, even the best laws and technologies aimed at protecting privacy are unlikely to be sufficient to stop the erosion of our privacy. The digital environment is very difficult to police; data can be easily moved across borders, stolen, or recorded without users’ consent. Also, most people want some of their social media posts, blogs, or profiles to be public. Few would be willing to cover their faces while interacting with others. As this and other studies show, this might be enough to deprive them of their privacy.

Essentially, we believe that further erosion of privacy is likely. Thus, the safety of gay and other minorities hinges not on the right to privacy but on the enforcement of human rights, and tolerance of societies and governments. In order for the post-privacy world to be safer and hospitable, it must be inhabited by well-educated people who are radically intolerant of intolerance.[2]

Were the authors concerned about publishing these results?

We were really disturbed by these results and spent much time considering whether they should be made public at all. We did not want to enable the very risks that we are warning against.

 

Recent press reports,[3] however, suggest that governments and corporations are already using tools aimed at revealing intimate traits from faces. Facial images of billions of people are stockpiled in digital and traditional archives, including dating platforms, photo-sharing websites, and government databases. Profile pictures on Facebook, LinkedIn, and Google Plus are public by default. CCTV cameras and smartphones can be used to take pictures of others’ faces without their permission.

 

We felt that there is an urgent need to make policymakers and LGBTQ communities aware of the risks that they are facing. Tech companies and government agencies are well aware of the potential of computer vision algorithm tools. We believe that people deserve to know about these risk and have the opportunity to take preventive measures.

 

We made sure that our work does not offer any advantage to those who may want to invade others’ privacy. We used widely available off-the-shelf tools, publicly available data, and standard methods well known to computer vision practitioners. We did not create a privacy-invading tool, but rather showed that basic and widely used methods pose serious privacy threats.

Figure 1. Composite faces and average face outlines produced by averaging faces/outlines classified as most likely to be gay or straight.

What facial features were employed by the algorithm to detect sexual orientation?

The average faces most likely to belong to gay men (see Figure 1) were more feminine, while the faces most likely to belong to lesbians were more masculine. Typically, men have larger jaws, shorter noses, and smaller foreheads. Gay men, however, tended to have narrower jaws, longer noses, larger foreheads, and less facial hair. Conversely, lesbians tended to have more masculine faces (larger jaws and smaller foreheads) than heterosexual women.

The gender atypicality of gay faces extended beyond morphology. Lesbians tended to use less eye makeup, had darker hair, and wore less revealing clothes (note the higher neckline)—indicating less feminine grooming and style. Furthermore, although women tend to smile more in general, lesbians smiled less than their heterosexual counterparts.

Additionally, consistent with the association between baseball caps and masculinity in American culture, heterosexual men and lesbians tended to wear baseball caps (see the shadow on their foreheads in Figure 1; this was also confirmed by a manual inspection of individual images).

Gender atypicality of the faces of gay men and women is consistent with a large number of previous studies. Previous findings (see the section on adult gender nonconformity in this review) showed gender atypicality in occupations, hobbies, patterns of movement (i.e., gestures and walking), speech (i.e., articulation), physical presentation (i.e., clothing choices and hairstyles), and facial appearance. Perhaps the most widely accepted theory used to account for gender atypicality is the prenatal hormone theory (PHT) of sexual orientation.[4] 

The fact that the facial features predictive of sexual orientation are consistent with the well-established theory yields support to the validity of the classifier.

How does one interpret the accuracy of the algorithm?

Our work is intended as a warning that predictions of this kind can be made with worrying accuracy, rather than an attempt to estimate what is the maximum accuracy of such predictions. We used basic tools and images in low resolution. Those deploying such methods in practice are using much more sensitive DNN models and devices.

How accurate was the classifier in our study? Interpreting classification accuracy is not trivial and is often counterintuitive!

Imagine a group of 1000 men, including 70 gay men, whose faces were assessed by the the classifier with an accuracy of AUC=.91 (comparable with the one achieved in our study for males with 5 images per person).

The classifier does not tell you which person is gay, but labels each person with a probability of being gay. The non-trivial decision that you need to make now, is to decide where to set the cut-off point - or what is the probability above which you classify someone as gay.

If you wanted to select a small sample of gay men and make few mistakes - label as gay only a few cases with top probabilities. You will get a high precision (e.g., the fraction of gay people among those classified as gay), but low recall (e.g., you will ‘miss’ many gay men). If you prefer to cast a wider net - you will ‘catch’ more gay men, but also erroneously label more straight men as gay (so called “false positives”). In other words, aiming for high precision reduces recall, and vice versa.

Back to the group of 1000 men, including 70 gay men. If one selected 100 random males from this sample, only 7 are expected to be gay: a random draw offers a precision of 7% (7 out of 100 selected men were gay).

Let’s turn on the classifier. Among the 100 of individuals with the highest probability of being gay according to the classifier, 47 were gay (precision = 47/100 = 47%). In other words, the classifier provided for a nearly seven-fold improvement in precision over a random selection. There were also 53 “false positives” - straight men classified as gay. Note, however, that as there are only 70 gay men in the examined population, there would be 30 “false positives” even if the classifier was perfect.

The number of false positives could be decreased, and the precision increased, by narrowing the targeted subsample. Among 30 males with the highest probability of being gay, 23 were gay, an eleven-fold improvement in precision over a random draw (Only 2.1 men would be expected to be gay in a random subset of 30 males). Finally, among the top 10 of individuals with the highest probability of being gay, 9 were indeed gay: a thirteen-fold improvement in precision over a random draw.

What are the potential mechanisms linking intimate traits with facial features?

There are three types of such mechanisms. First, character can influence one’s facial appearance. For example, women that scored high on extroversion early in life tend to become more attractive with age.

Second, facial appearance can alter one’s character. Good-looking people, for example, receive more positive social feedback, and thus tend to become even more extroverted.

Third, many factors affect both facial appearance and one’s traits. Those include prenatal and postnatal hormonal levels, developmental history, environmental factors, and genes. Testosterone, for instance, significantly affects both: behavior (e.g., dominance) and facial appearance (e.g., facial width and facial hair).

 

***

You must be wrong – this is pseudoscience!

Our research, as any other scientific study, could be wrong in many different ways. We discuss some of them below.

You may also want to start with this great article from LGBTQ Nation.

“This must be wrong; it’s only based on white people!”

Despite our attempts to obtain a more diverse sample, we were limited to studying white participants from the U.S. (Unfortunately, this seems to be the problem affecting most other studies of sexual orientation.)

This does not invalidate the results of the study in any way. The study shows that you can distinguish between white gay and straight individuals.

It does not show that the same applies to other ethnicities - but the our findings suggest that this, unfortunately, is likely.. The same biological, developmental, and cultural factors—which are responsible for differences between gay and straight individuals—are likely to affect people of other races as well.  

This must be wrong; bisexual people were excluded from the analysis.”

That’s true; we did not check if you can predict whether someone is bisexual from their face.

 

This does not invalidate the results in any way. We still show that you can distinguish between gay and straight individuals. It is possible that some of the users categorized as heterosexual or gay were, in fact, bisexual. Correcting such errors, however, would likely boost the accuracy of the classifiers examined here.

Importantly, excluding bisexual or non-binary people does not mean that we are denying their existence.

“This must be wrong; you used a dating website sample of openly gay/straight people!

That is a legitimate limitation and we discuss it at length in our paper. It is reasonable to expect that the images obtained from a dating website could be especially revealing of sexual orientation; this, however, did not seem to be the case.

First, we tested our classifier on an external sample of Facebook photos. It achieved comparable accuracy as on the dating website sample, suggesting that the images from the dating website were not more revealing than Facebook profile pictures.

Second, we asked humans to judge the sexual orientation of these faces. Human accuracy was no better than in the past studies where humans judged sexual orientation from carefully standardized images taken in the lab. This shows that the images used here were not especially revealing of sexual orientation—at least, not to humans.

Finally, the deep neural network used here was specifically trained to focus on fixed facial features that cannot be easily altered, such as the shape of facial elements. This helped in reducing the risk of the classifier discovering some superficial and not face-related differences between facial images of gay and straight people used in this study.

“This must be wrong; it is widely known that there are no links between faces and character traits!”

Unfortunately, this belief is not supported by evidence.

Many studies have shown that people can consistently, although with low accuracy, determine others’ political views, personality, sexual orientation, honesty, and many other traits.[5] Also, humans’ low accuracy when judging such traits does not necessarily mean that those traits are not prominently displayed on the face. Instead, people may lack the ability to detect or interpret them.

“This must be wrong; I read on Wired that physiognomists believed that criminals were part ape!”

Well, it seems that physiognomists were at least partially correct, as we are all 100% ape.

There is no doubt, however, that physiognomy was based on unscientific studies, superstition, anecdotal evidence, and racist pseudo-theories. Physiognomists were clearly wrong when they claimed that they could judge people based on the appearance of their faces. A large number of studies have shown that people are not very accurate at this task.

However, the fact that physiognomists were wrong about many things does not automatically invalidate all of their claims. The same studies that prove that people cannot accurately do what physiognomists claimed was possible consistently show that they were, nevertheless, better than chance.

Thus, physiognomists main claimthat the character is to some extent displayed on one’s faceseems to be correct (while being rather upsetting).

“This must be wrong; your classifier is certainly picking on something unrelated to the face when making predictions!”

That’s something we thought about a lot about, and we hope that future studies will help to prove or disprove the predictability of sexual orientation from human faces. We have, however, put much effort into controlling this issue.

First, our models were specifically trained to focus on fixed facial features that cannot be easily altered, such as the shape of facial elements. The deep neural network used here was trained for a completely different task: recognizing the same person across images. This helped us to reduce the risk of the classifier discovering some superficial and not face-related differences between facial images of gay and straight people used in this study.

Second, we validated the findings on an external sample.

Third, we investigated what elements of the facial image were predictive of sexual orientation to ascertain that it was, in fact, facial features (and not other factors). As you can read in the paper, even if all of the visual information is removed, the classifier can still be quite accurate based on merely the outline of the face.

Fourth, we revealed only the facial area to the classifier, and removed the background of the images. We also checked that the classifier focused on facial features and not the background while making the prediction. Heatmaps below (taken from Figure 3 in the paper) clearly show that the classifier focused on facial areas (red) and ignored the background (blue)

Finally, and perhaps most importantly, the differences between gay and straight faces picked up by the classifier are consistent with and predicted by the prenatal hormone theorythe most widely accepted theory explaining the origins of sexual orientation.

“This must be wrong; your findings show that gay people tend to be gender atypicaland I know many gender-typical gay men and women!”

We also know many very masculine gay men and very feminine gay women. We also know many very old men, which does not invalidate the statement that women tend to live longer. The fact that the faces of gay men are more feminine on average (as they tended to be in our study) does not imply that all gay men are more feminine than all heterosexual men, or that there are no gay men with very masculine faces (and vice versa for lesbians).

The differences in femininity/masculinity observed in this study were subtle and spread across many facial features: enough to be apparent to a sensitive algorithm, but imperceptible to humans.

Also, please read the “The causes of sexual orientation: An interim summary” in this review article.

“This must be wrong; many of your participants must have lied about their sexual orientation!”

It is certainly possible that some of the participants who told us that they were straight were, in fact, gay (or vice versa). We believe, however, that people voluntarily posting and seeking partners on dating websites have little incentive to lie about their sexual orientation.

Also, if some of our participants were, in fact, mislabeled, correcting such errors would most likely further increase the classification accuracy.

“This must be wrong; the only reason that it works is because gay men have better style or take better pictures!”

We could be easily convinced that gay men (our gay male friends for sure!) have better hairstyles and facial hairstyles, and take better pictures. As we discuss in our paper, gay and straight faces do differ in terms of grooming. However, they also seem to differ in terms of morphology. Facial contour alone provided for an accuracy of over 70% among men and above 60% among women.

Even if the differences between gay and straight faces are exclusively due to differences in grooming, lifestyle, or fashion (i.e., nurture), this does not necessarily reduce the privacy threats faced by gay men and women. Many of the grooming or fashion choices are made unconsciously; removing other revealing features might require changing someone’s lifestyle.


Notes

Source: https://www.economist.com/blogs/graphicdetail/2013/12/daily-chart-15

Some articles whose authors have actually bothered to read the paper (under construction):

  1. https://www.economist.com/news/science-and-technology/21728614-machines-read-faces-are-coming-advances-ai-are-used-spot-signs
  2. https://www.vice.com/sv/article/7xkdab/forskare-ai-ansikte-hbtq
  3. ...
  4. Let me know if I am missing an article


[1] Unfortunately, these findings do not seem to replicate.

[2] Instead, societies and policymakers are focusing on discussing how best to protect privacy. This, in our view, is a distraction from the discussion that we should, in this context, have: how to make sure that the post-privacy world is a safe place. Even if we are wrong and privacy could be preserved, a tolerant world—where losing your privacy is not putting you at risk—would be a much better place.

[3] See, for instance, these Wall Street Journal and Business Insider articles.

[4] According to the PHT, same-gender sexual orientation stems from the underexposure of male fetuses and overexposure of female fetuses to androgens that are responsible for sexual differentiation. As the same androgens are responsible for the sexual dimorphism of the face and the brain, the PHT predicts that gay people will tend to have gender-atypical facial morphology and gender atypical preferences (including gender atypical sexual preferences).

[5] See the many references in our paper, and read this.