Abstract:
Location data is among the most sensitive data regarding the privacy of the observed
users. To collect location data, mobile phones and other mobile devices constantly track
their positions. This work examines the question whether publicly available spatio-temporal
user data can be used to link newly observed location data to known user profiles. For
this study, publicly available location information about Twitter users is used to construct
spatio-temporal user profiles describing a user's movement in space and time. It shows
how to use these profiles to match a new location trace to their user with high accuracy.
Furthermore, it shows how to link users of two different trace data sets.
For this case study, 15,989 of the most prolific Twitter users in London in 2014 are
considered. The experimental results show that the classification approach allows to correctly
identify 98 % of the most prolific 500 of these users. Furthermore, it can correctly
identify more than 50 % of any users by using three observations of these users, rather than
their whole location trace. This alarming result shows that spatio-temporal data is highly
discriminative, thus putting the privacy of hundreds of millions of geo-social network users
at a risk. It further shows that it can correctly match most users of Instagram to users of
Twitter.