TY - JOUR
T1 - The impact of privacy protection measures on the utility of crowdsourced cycling data
AU - Raturi, Varun
AU - Hong, Jinhyun
AU - McArthur, David Philip
AU - Livingston, Mark
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/4
Y1 - 2021/4
N2 - The use of new forms of data in the transport research domain is rapidly gaining popularity. However, these data come with specific challenges and one of the major concerns is maintaining the privacy of data subjects. One widely used approach to anonymise the data is to apply binning. Recently, data from activity-tracking applications like Strava has been utilised to study and analyse active travel. Due to privacy concerns, Strava has started providing data in a discretised format from July 2018. In this study, we aim to analyse the impact of the binning criteria on the utility of the crowdsourced data by using Strava data from 2013 to 2016 for the city of Glasgow. We applied the Strava binning criteria on the original dataset at three different temporal aggregations (i.e., Hourly, Daily and Monthly) and conducted different analyses to examine its impacts. First, we compared manual cycling counts with original and binned cycling counts from Strava data. Second, net-errors were calculated by comparing original and binned cycling counts from Strava data. Third, we estimated spatial autocorrelation statistics based on original and binned Strava counts and investigated the extent to which research outcomes change because of the binning approach. Our results confirmed significant amount of information loss. Worryingly, we also show that conclusions reached by previous studies could have been reversed if the new specification of the data had been used. We outline here what precautions researchers and planners should take when working with the binned data.
AB - The use of new forms of data in the transport research domain is rapidly gaining popularity. However, these data come with specific challenges and one of the major concerns is maintaining the privacy of data subjects. One widely used approach to anonymise the data is to apply binning. Recently, data from activity-tracking applications like Strava has been utilised to study and analyse active travel. Due to privacy concerns, Strava has started providing data in a discretised format from July 2018. In this study, we aim to analyse the impact of the binning criteria on the utility of the crowdsourced data by using Strava data from 2013 to 2016 for the city of Glasgow. We applied the Strava binning criteria on the original dataset at three different temporal aggregations (i.e., Hourly, Daily and Monthly) and conducted different analyses to examine its impacts. First, we compared manual cycling counts with original and binned cycling counts from Strava data. Second, net-errors were calculated by comparing original and binned cycling counts from Strava data. Third, we estimated spatial autocorrelation statistics based on original and binned Strava counts and investigated the extent to which research outcomes change because of the binning approach. Our results confirmed significant amount of information loss. Worryingly, we also show that conclusions reached by previous studies could have been reversed if the new specification of the data had been used. We outline here what precautions researchers and planners should take when working with the binned data.
KW - Crowdsourced data
KW - Cycling
KW - Infrastructure
KW - Privacy
KW - Spatial autocorrelation
KW - Strava
UR - http://www.scopus.com/inward/record.url?scp=85103112120&partnerID=8YFLogxK
U2 - 10.1016/j.jtrangeo.2021.103020
DO - 10.1016/j.jtrangeo.2021.103020
M3 - Article
AN - SCOPUS:85103112120
SN - 0966-6923
VL - 92
JO - Journal of Transport Geography
JF - Journal of Transport Geography
M1 - 103020
ER -