However, there are only sparse resources looking at Tinder app data on an user level. One reason for that being that data is not easy to gather. One approach is to ask Tinder for your own data. This process was used in this inspiring analysis which focuses on matching rates and messaging between users. Another way is to create profiles and automatically collect data on your own by using the undocumented Tinder API. This method was used in a paper which is summarized neatly in this blogpost. The paper's focus also was the study of matching and messaging behavior of users. Lastly, this post summarizes finding from the biographies of male and female Tinder profiles from Sydney.
In the following, we will complement and expand previous analysis on Tinder data. Using an unique, extensive dataset we will apply descriptive statistics, natural language processing and visualizations in order to uncover patterns on Tinder. In this first analysis we will focus on insights from profiles we observe during swiping as a male. What is more, we observe female profiles from swiping as a heterosexual as well as male profiles from swiping as a homosexual. In this follow up post we then look at novel findings from a field experiment on Tinder. The results will reveal new insights regarding liking behavior and patterns in matching and messaging of users.
The dataset was gathered using bots making use of the unofficial Tinder API. The bots used two almost identical male profiles aged 29 to swipe in Germany. There were two consecutive phases of swiping, each over the course of four weeks. After each week, the location was set to the city center of one of the following cities: Berlin, Frankfurt, Hamburg and Munich. The distance filter was set to 16km and age filter to 20-40. The search preference was set to women for the heterosexual and respectively to men for the homosexual treatment. Each bot encountered about 300 profiles per day. The profile data was returned in
JSON format in batches of 10-30 profiles per response.
Unfortunately, I won't be able to share the dataset because doing so is in a gray area. Check out this post to learn about the many legal issues that come with such datasets.
Setting up things¶
In the following, I will share my data analysis of the dataset using a Jupyter Notebook. So, let's get started by first importing the packages we will use and setting some options:
# coding: utf-8 import pandas as pd import numpy as np import nltk import textblob import datetime from wordcloud import WordCloud from PIL import Image from IPython.display import Markdown as md from pandas.io.json import json_normalize import hvplot.pandas #from bokeh.io import output_notebook #output_notebook() pd.set_option('display.max_columns', 100) from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" import holoviews as hv hv.extension('bokeh')