Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA; email:
Bakar Computational Health Sciences Institute, University of California, San Francisco, California, USA.
Annu Rev Public Health. 2022 Apr 5;43:59-78. doi: 10.1146/annurev-publhealth-051920-110928. Epub 2021 Dec 6.
The big data revolution presents an exciting frontier to expand public health research, broadening the scope of research and increasing the precision of answers. Despite these advances, scientists must be vigilant against also advancing potential harms toward marginalized communities. In this review, we provide examples in which big data applications have (unintentionally) perpetuated discriminatory practices, while also highlighting opportunities for big data applications to advance equity in public health. Here, big data is framed in the context of the five Vs (volume, velocity, veracity, variety, and value), and we propose a sixth V, virtuosity, which incorporates equity and justice frameworks. Analytic approaches to improving equity are presented using social computational big data, fairness in machine learning algorithms, medical claims data, and data augmentation as illustrations. Throughout, we emphasize the biasing influence of data absenteeism and positionality and conclude with recommendations for incorporating an equity lens into big data research.
大数据革命为拓展公共卫生研究提供了一个令人兴奋的前沿领域,拓宽了研究范围并提高了答案的准确性。尽管取得了这些进展,但科学家们必须警惕也可能会对边缘化社区带来潜在的危害。在这篇综述中,我们提供了一些例子,说明大数据应用程序(无意地)延续了歧视性做法,同时也强调了大数据应用程序在公共卫生领域促进公平的机会。在这里,大数据是在五个“V”(即数量、速度、真实性、多样性和价值)的背景下构建的,我们提出了第六个“V”,即“精湛技艺”,将公平和正义框架纳入其中。我们使用社会计算大数据、机器学习算法中的公平性、医疗索赔数据和数据增强来展示改进公平性的分析方法。整篇文章都强调了数据缺失和定位的偏见影响,并以将公平视角纳入大数据研究的建议作为结论。