Motivation

Tinder is a huge phenomenon in the online dating world. Because of its massive user base it potentially offers lots of data that is exciting to analyze. A general overview on Tinder can be found in this article which mainly looks at business key figures and surveys of users:

Tinder usage in the UK

Source: Survey by Weareflint


However, there are only sparse resources looking at Tinder app data on an user level. One reason for that being that data is not easy to gather. One approach is to ask Tinder for your own data. This process was used in this inspiring analysis which focuses on matching rates and messaging between users. Another way is to create profiles and automatically collect data on your own by using the undocumented Tinder API. This method was used in a paper which is summarized neatly in this blogpost. The paper's focus also was the study of matching and messaging behavior of users. Lastly, this post summarizes finding from the biographies of male and female Tinder profiles from Sydney.

In the following, we will complement and expand previous analysis on Tinder data. Using an unique, extensive dataset we will apply descriptive statistics, natural language processing and visualizations in order to uncover patterns on Tinder. In this first analysis we will focus on insights from profiles we observe during swiping as a male. What is more, we observe female profiles from swiping as a heterosexual as well as male profiles from swiping as a homosexual. In a follow up post we will then look at novel findings from a field experiment on Tinder. The results will reveal new insights regarding liking behavior and patterns in matching and messaging of users.

Data collection

The dataset was gathered using bots making use of the unofficial Tinder API. The bots used two almost identical male profiles aged 29 to swipe in Germany. There were two consecutive phases of swiping, each over the course of four weeks. After each week, the location was set to the city center of one of the following cities: Berlin, Frankfurt, Hamburg and Munich. The distance filter was set to 16km and age filter to 20-40. The search preference was set to women for the heterosexual and respectively to men for the homosexual treatment. Each bot encountered about 300 profiles per day. The profile data was returned in JSON format in batches of 10-30 profiles per response.
Unfortunately, I won't be able to share the dataset because doing so is in a gray area. Check out this post to learn about the many legal issues that come with such datasets.

Setting up things

In the following, I will share my data analysis of the dataset using a Jupyter Notebook. So, let's get started by first importing the packages we will use and setting some options:

In [41]:
# coding: utf-8
import pandas as pd
import numpy as np
import nltk
import textblob
import datetime
from wordcloud import WordCloud
from PIL import Image
from IPython.display import Markdown as md
from pandas.io.json import json_normalize
import hvplot.pandas

pd.set_option('display.max_columns', 100)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import holoviews as hv
hv.extension('bokeh')