The OKCupid dataset

a very large open dataset of dating site users

The power of large open datasets

  • Verifiable analyses -- others can repeat the same analyses on the same data. Check for errors.
  • New analyses -- others can do new analyses on the same data. Resources utilized better.
  • Free to play -- anyone can access the data, no wasted time/energy on red tape. No blocking of critics!


  • Largest English-language dating site. Millions of users.
  • Users answer questions in order to be matched with other users.
  • Questions are user-made; very varied and cover topics academics would not (dare to) ask about.
  • There are about 14 useful cognitive ability items.

The dataset

  • 3-4 months of near 24/7 scraping (automatic downloading with a Python script).
  • About 68k users.
  • About 2500 variables.
  • Only public data -- no “who wrote to who”-data, no names, no emails etc.
  • No images, no profile text.

Test analysis: Cognitive ability and political participation/interest

Test analysis: Cognitive ability and religiousness

Test analysis: Zodiac sign and … all other variables

Where can I get this data??

  • Paper is under review in Open Differential Psychology. Open peer review at the journal forum:
  • All data, figures and code are available via the website.
  • … or from me (on USB).
ISIR 2016 talk: The OKCupid dataset - Google Slides