Data-Dive

Are you swiping Tinder on hard mode? An experiment on name based discrimination in online dating

· mc51

Motivation

In a previous post, we had a detailed look at profiles encountered while swiping Tinder. For that, we used some of the data gathered during a field experiment. Here, I present the main findings of that experiment which was designed to investigate whether there is name based discrimination on Tinder. Our question is: Do users on Tinder receive less likes solely because of their foreign (Islamic) name? This kind of discrimination has been uncovered in many fields of every day life throughout different countries, including Germany. One of the few investigations in the realm of online dating can be found in this paper. The authors find evidence for a penalty on using Arab names on a Swedish dating website. Here, I offer novel findings regarding name based discrimination by looking specifically at Tinder and using a very realistic approach. Moreover, I present unique results regarding different patterns between swiping as a homo- and heterosexual male on Tinder.

Methodology

The dataset was gathered using bots making use of the unofficial Tinder API. The bots used two identical male profiles aged 29 to swipe in Germany. The profiles were kept simple. They included two real pictures in which the face and upper body were clearly visible. No other information (e.g. Instagram, Spotify, Biography) was provided. The only difference between profiles was the name used: Mohamed or Maurice. Maurice (which is a foreign name in Germany) was chosen instead of a German name in order to compare two foreign names. Thus, outcome differences will not be because of a preference for German names but because of a distaste for the islamic name. There were two consecutive phases of swiping each over the course of four weeks. The first was for the heterosexual and the second for the homosexual treatment. During each phase, the bots swiped four times a day randomly liking about 33% of the encountered profiles. The maximum amount of likes per day (100 for a non-premium account) was exhausted. After each week, the location was changed to the city center of one of the following cities: Berlin, Frankfurt, Hamburg and Munich. After one of the bots swiped in a city the other would follow a week later leaving a “buffer” between them. The distance filter was set to 16km (10 miles) and age filter to 20-40. The search preference was set to women for the heterosexual and respectively to men for the homosexual treatment. Unfortunately, I won’t be able to share the dataset because doing so is in a gray area. Check out this post to learn about the many legal issues that come with such datasets.

Analysis

We will be using the following dataset which contains several pieces of information about the match. It has been obtained after cleaning and pre processing the raw data:

profiles.head()
_id closed common_friend_count common_like_count created_date dead following following_moments id is_boost_match is_fast_match is_super_boost_match is_super_like last_activity_date message_count messages muted participants pending person._id person.bio person.birth_date person.gender person.name person.photos person.ping_time readreceipt.enabled seen.match_seen seen.last_seen_msg_id person.hide_age person.hide_distance super_liker scrape_time city bot _id_rec scrape_time_rec city_rec
0 5d2ba4f7c663941600e146ce5d2f591805cbb61500ed3336 False 0.0 0.0 2019-07-19T11:21:33.851Z False True True 5d2ba4f7c663941600e146ce5d2f591805cbb61500ed3336 False False False False 2019-07-19T11:21:49.440Z 0.0 [{'_id': '5d31a7cdbc5b3a0100b9c9af', 'created_... False ['5d2f591805cbb61500ed3336'] False 5d2f591805cbb61500ed3336 :) 1987-08-09T19:13:46.592Z 0.0 SP [{'crop_info': {'algo': {'height_pct': 0.37477... 2014-12-09T00:00:00.000Z False False NaN NaN NaN NaN 2019-08-06 19:13:47 hamburg 3 5d2f591805cbb61500ed3336 2019-07-19 10:04:28 munich
1 5d2ba4f7c663941600e146ce5d31bea4d854331f00f8eae3 False 0.0 0.0 2019-07-19T16:14:37.078Z False True True 5d2ba4f7c663941600e146ce5d31bea4d854331f00f8eae3 False False False False 2019-07-20T21:33:41.843Z 0.0 [{'_id': '5d31ec8953a67c0100624ddc', 'created_... False ['5d31bea4d854331f00f8eae3'] False 5d31bea4d854331f00f8eae3 Serien, Filme, Anime und Sport <3\n\n1.87m 1993-08-09T19:13:46.592Z 0.0 Yago [{'crop_info': {'algo': {'height_pct': 0.34891... 2014-12-09T00:00:00.000Z False True 5d31ec8953a67c0100624ddc NaN NaN NaN 2019-08-06 19:13:47 hamburg 3 5d31bea4d854331f00f8eae3 2019-07-19 14:17:19 munich
2 5d2ba4f7c663941600e146ce5d31b3f827a9e71500dbf5f1 False 0.0 0.0 2019-07-20T08:35:56.889Z False True True 5d2ba4f7c663941600e146ce5d31b3f827a9e71500dbf5f1 False False False False 2019-07-20T08:39:27.335Z 0.0 [{'_id': '5d32d33f72ac330100e7a4d8', 'created_... False ['5d31b3f827a9e71500dbf5f1'] False 5d31b3f827a9e71500dbf5f1 Hey! Einfach Mal Lust auf nen Kaffee oder ein... 1997-08-09T19:13:46.592Z 0.0 Sebastian [{'crop_info': {'processed_by_bullseye': True,... 2014-12-09T00:00:00.000Z False False NaN NaN NaN NaN 2019-08-06 19:13:47 hamburg 3 5d31b3f827a9e71500dbf5f1 2019-07-19 14:17:19 munich
3 5d2ba4f7c663941600e146ce5d32bb40d322281500ba9569 False 0.0 0.0 2019-07-20T15:57:53.935Z False True True 5d2ba4f7c663941600e146ce5d32bb40d322281500ba9569 False False False False 2019-07-20T15:57:53.935Z 0.0 [] False ['5d32bb40d322281500ba9569'] False 5d32bb40d322281500ba9569 Volountier\nKnowing people and cultures.\nDeut... 1986-08-09T19:13:46.592Z 0.0 Renato [{'crop_info': {'algo': {'height_pct': 0.64371... 2014-12-09T00:00:00.000Z False False NaN NaN NaN NaN 2019-08-06 19:13:47 hamburg 3 5d32bb40d322281500ba9569 2019-07-20 08:26:17 munich
4 5d2ba4f7c663941600e146ce5d3588263979f01600a0a225 False 0.0 0.0 2019-07-22T10:11:26.073Z False True True 5d2ba4f7c663941600e146ce5d3588263979f01600a0a225 False False False False 2019-07-22T10:11:26.073Z 0.0 [] False ['5d3588263979f01600a0a225'] False 5d3588263979f01600a0a225 NaN 1996-08-09T19:13:46.592Z 0.0 Julian [{'crop_info': {'algo': {'height_pct': 0.19431... 2014-12-09T00:00:00.000Z False False NaN NaN NaN NaN 2019-08-06 19:13:47 hamburg 3 5d3588263979f01600a0a225 2019-07-22 10:02:41 berlin

We start by defining a variable for the different treatments in order to use it for annotations. We choose the profile name used by each bot and indicate whether it was swiping as a “straight (s)” or “gay (g)” man. Then, we calculate the duration from like to match and visualize it:

# set a readable name for the different treatments
profiles.loc[profiles['bot'] == 1, 'treatment'] = 'Mohamed (s)'
profiles.loc[profiles['bot'] == 3, 'treatment'] = 'Mohamed (g)'
profiles.loc[profiles['bot'] == 2, 'treatment'] = 'Maurice (s)'
profiles.loc[profiles['bot'] == 4, 'treatment'] = 'Maurice (g)'
# Calc time from like to match in minutes
profiles['created_date'] = pd.to_datetime(profiles['created_date'], utc=True)
profiles['scrape_time_rec'] = pd.to_datetime(profiles['scrape_time_rec'], utc=True)
profiles['time_until_match'] =\
    (profiles['created_date'] - profiles['scrape_time_rec']) / np.timedelta64(1,'m')
profiles['time_until_match'] = profiles['time_until_match'].\
                                   fillna(0).map(lambda x: floor(x))
# Categorize times into bins and visualize
bins =  [-1, 5, 30, 60*12, 60*24, 60*48, np.inf]
labels = [" <5m", "<0.5h", "<12h", "<24h", "<48h", ">=48h"]
profiles["time_to_match_cat"] = pd.cut(profiles['time_until_match'],
                                       bins=bins, labels=labels)

time_to_match = profiles.groupby("treatment")["time_to_match_cat"]\
                    .value_counts(normalize=True).unstack() * 100
time_to_match = time_to_match.reindex(columns=labels)
time_to_match.hvplot.bar(yformatter="%.0f%%", xlabel="Time until match")\
    .opts(width=600, height=400, xrotation=90)

By far most matches are instant and this holds true for all treatments. Of the remaining matches, almost all happen in less than half a day after liking. After that, the chances that we match with somebody we liked before decrease to around 10%. This is because Tinder’s algorithm tries to increase the match rate. Consequently, active people and those who have already liked you will be suggested to you early on. Only when those profiles are exhausted more inactive profiles are suggested as well.

Now, lets see how many profiles were liked by each bot during swiping:

def get_likes(bot_number):
    """ get number of likes from bot's status file and join to corresponding treatment
    """
    likes = {}
    with open(f"./data/raw/tinderbot{bot_number}/data/status_{bot_number}.json",
              "r") as file:
        status = json.load(file)
        for city in status.keys():
            if city not in "ddorf":
                likes[city] = status.get(city).get("num_likes")
    return likes

likes = []
for i in range(1, 5):
    likes.append(get_likes(i))
    
likes = pd.DataFrame(likes)
likes.index +=1
likes = likes.join(profiles[["bot", "treatment"]]\
                   .drop_duplicates("treatment").set_index("bot")).set_index("treatment")
likes
munich berlin frankfurt hamburg
treatment
Mohamed (s) 700 600 600 600
Maurice (s) 599 600 700 600
Mohamed (g) 550 360 360 360
Maurice (g) 360 360 550 360

In the straight treatment each bot liked 100 profiles per day. This is the maximum of likes you get with a free Tinder account. In the gay treatment this was not possible because that would have exhausted the total user base in some cities. This explains the difference between treatments. Also, Sunday was the off day for our bots: no swiping was done as we only waited for matched to come in from our previous likes. The difference between cities within treatments is because each bot swiped an additional day in the first city. Also, for the gay treatment the reduction from 100 to 60 likes per day was established after switching to city number two.

Now, we check how successful each of our profiles has been at collecting matches. Hence, we examine the absolute number of matches for each treatment first:

# Total number of matches per treatment
# profiles.groupby("treatment")["_id"].count().rename("Total number of matches")
num_matches = profiles.groupby("treatment")["_id"].count().rename("count")
num_matches.hvplot.bar().opts(width=600, height=400, title="Total number of matches by treatment")

The most staggering finding is the huge difference in the number of matches between swiping as a homosexual and a heterosexual man. Even though the total number of profiles we liked is way lower for the gay treatment the number of matches is significantly higher. In addition, we also uncover a stark difference within the heterosexual treatment. While Maurice has 180 matches Mohamed has only 102. Let’s dig deeper and make those numbers more comparable by looking at the relative numbers:

# match to like ratio in % by treatment and city
CITIES = likes.columns[0:4].values
matches = profiles.groupby(["treatment", "city_rec"])["_id"].count()\
            .unstack()
matches_share = matches[CITIES] / likes[CITIES] * 100
matches_share_mean = matches_share.transpose().mean().rename("Overall match to like ratio")
matches_share_mean.hvplot.bar()\
    .opts(width=600, height=400, yformatter="%.0f%%")

There is a lot to learn from this result as it contains the main findings of our experiment. First, matching with other men on Tinder was easy for our gay profiles. Both have achieved a match rate above 50%. The difference between using Maurice or Mohamed as the profile name is negligible. On the other hand, our heterosexual profiles played the hard version of the game: their match ratio is substantially lower. Moreover, we find a significant difference between Maurice’s (7,1%) and Mohamed’s (4,0%) success rate. The conclusion is remarkable: using Maurice instead of Mohamed as a profile name might add 77,5% to your match probability. Lucky you, Maurice. Or the other way around: Sucks to be you, Mohamed…

Let’s go into more detail by looking at the city level. What’s the best and the worst city to swipe in depending on your sexuality and name?

matches_share
matches_share.mean().rename("match rate").reset_index()
matches_share.hvplot.bar().groupby("treatment")\
    .opts(alpha=0.4, muted_alpha=0.05, legend_position='right', 
          width=600, height=400, title="Match to like ratio by city and treatment",
          xlabel="", ylabel="", yformatter="%.0f%%").overlay()
city_rec munich berlin frankfurt hamburg
treatment
Maurice (g) 54.166667 46.388889 54.000000 60.000000
Maurice (s) 6.176962 2.833333 9.285714 10.166667
Mohamed (g) 51.090909 45.000000 56.944444 52.222222
Mohamed (s) 5.428571 1.833333 2.833333 6.000000
city_rec match rate
0 munich 29.215777
1 berlin 24.013889
2 frankfurt 30.765873
3 hamburg 32.097222

Over all treatments, we see a similar picture in terms of average matching rates. People in Berlin are the most picky by far (24% average matching rate). They are followed by Munich (29%) and Frankfurt (30%). The highest success rate was observed in Hamburg with 32%. As for the gay treatment, there is no major difference between the used name in most cities. The one minor difference we observe is that in Hamburg Mohamed only gets ~52% matches while Maurice gets ~60%. As we’ve learned before, in the straight treatment the match rates for Mohamed are lower all over. Surprisingly, this difference is much more accentuated in particular cities. In Munich, Mohamed gets around 84% of Maurice’s match rate. In Hamburg and Berlin this share drops to around 60% and in Frankfurt it plummets to only 30%! Hence, name based discrimination seems to be a very regional phenomenon.

Liking someone on Tinder is a first show of interest. However, it requires minimal effort and as such doesn’t necessarily hold too much meaning. In contrast, a message is an additional, important step towards a potential date and requires a more serious involvement. Thus, we also take a look at messaging behavior:

# Who sent the first message. Could be us or the other person
import ast
def message_sender(msg):
    var = ast.literal_eval(msg)
    if var:
        return var[0].get("from")
        
profiles["msg_from"] = profiles["messages"].map(lambda x: message_sender(x))
# Dummy for received message
profiles.loc[profiles["person._id"] == profiles["msg_from"], "msg_recv"] = 1
matches = profiles.groupby(["treatment"])["_id"].count()
messages = profiles.groupby("treatment")["msg_recv"].count()
messages
treatment
Maurice (g)    336
Maurice (s)     15
Mohamed (g)    311
Mohamed (s)     10
Name: msg_recv, dtype: int64

In absolute terms, the result is sobering for the straight treatment. Receiving a first message from your match with a woman is very uncommon. In contrast, as a gay man your inbox will be on fire! But since the absolute number of matches was highly skewed as well, we also look at the relative picture:

msg_rcv_per_like = messages / matches * 100
msg_rcv_per_like = msg_rcv_per_like.rename("Share of matches sending a first message")
msg_rcv_per_like.hvplot.bar()\
    .opts(width=600, height=400, yformatter="%.0f%%")

As a gay man, we receive a first message from our matches in about 37% of the cases. The difference between Maurice and Mohamed is diminutive. In the straight treatment Maurice received a first message from 8,3% and Mohamed from 9.8% of their respective matches. This indicates that Mohamed’s matches might be more involved. Keeping in mind the low overall absolute number, this is probably not statistically significant, though.

Summary

We conducted an unique field experiment on Tinder and generated novel insights. The motivation was to compare swiping results between a homosexual and heterosexual male and to answer whether name based discrimination can be found on the App. We have found that:

  1. Matching is mostly instant. Chances of a match occurrence quickly decrease in time after a like
  2. Gay men get vastly more matches compared to straight men
  3. If you are heterosexual your name explains a big part of your matching success. It can add up to 77,5% to your match probability. For homosexual men there is no difference
  4. Chances to get a match are highest in Hamburg and lowest in Berlin
  5. The degree of name based discrimination varies regionally. Munich shows the lowest and Frankfurt the highest level
  6. Less than 10% of women send a first message after a match. For gay men that figure is about 37%