Department of Civil, Environmental, and Geospatial Engineering, Michigan Technological University, Houghton, MI, United States of America.
College of Information Studies, University of Maryland, College Park, MD, United States of America.
PLoS One. 2023 Oct 18;18(10):e0292090. doi: 10.1371/journal.pone.0292090. eCollection 2023.
In light of the outbreak of COVID-19, analyzing and measuring human mobility has become increasingly important. A wide range of studies have explored spatiotemporal trends over time, examined associations with other variables, evaluated non-pharmacologic interventions (NPIs), and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly available mobility data, a key question remains unanswered: are models using mobility data performing equitably across demographic groups? We hypothesize that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. To test our hypothesis, we applied two mobility-based COVID infection prediction models at the county level in the United States using SafeGraph data, and correlated model performance with sociodemographic traits. Findings revealed that there is a systematic bias in models' performance toward certain demographic characteristics. Specifically, the models tend to favor large, highly educated, wealthy, young, and urban counties. We hypothesize that the mobility data currently used by many predictive models tends to capture less information about older, poorer, less educated and people from rural regions, which in turn negatively impacts the accuracy of the COVID-19 prediction in these areas. Ultimately, this study points to the need of improved data collection and sampling approaches that allow for an accurate representation of the mobility patterns across demographic groups.
鉴于 COVID-19 的爆发,分析和衡量人类流动性变得越来越重要。大量研究探讨了随时间推移的时空趋势,研究了与其他变量的关联,评估了非药物干预措施(NPIs),并使用移动数据预测或模拟了 COVID-19 的传播。尽管公开可用的移动数据有其好处,但一个关键问题仍未得到解答:使用移动数据的模型是否在所有人群中表现公平?我们假设,用于训练预测模型的移动数据中的偏差可能导致某些人群的预测不准确。为了检验我们的假设,我们在美国使用 SafeGraph 数据在县一级应用了两种基于移动性的 COVID 感染预测模型,并将模型性能与社会人口特征相关联。研究结果表明,模型的性能存在对某些人口特征的系统性偏差。具体来说,这些模型往往倾向于大、高学历、富裕、年轻和城市的县。我们假设,许多预测模型目前使用的移动数据往往较少地捕捉到关于年龄较大、较贫穷、教育程度较低和来自农村地区的人的信息,这反过来又对这些地区 COVID-19 预测的准确性产生负面影响。最终,这项研究表明需要改进数据收集和采样方法,以在人口群体中准确地代表移动模式。