Hirsch Jana A, Moore Kari A, Cahill Jesse, Quinn James, Zhao Yuzhe, Bayer Felicia J, Rundle Andrew, Lovasi Gina S
Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, PA, Philadelphia, USA.
Urban Health Collaborative, Dornsife School of Public Health, Drexel University, Philadelphia, PA, USA.
J Urban Health. 2021 Apr;98(2):271-284. doi: 10.1007/s11524-020-00482-2. Epub 2020 Oct 1.
Retail environments, such as healthcare locations, food stores, and recreation facilities, may be relevant to many health behaviors and outcomes. However, minimal guidance on how to collect, process, aggregate, and link these data results in inconsistent or incomplete measurement that can introduce misclassification bias and limit replication of existing research. We describe the following steps to leverage business data for longitudinal neighborhood health research: re-geolocating establishment addresses, preliminary classification using standard industrial codes, systematic checks to refine classifications, incorporation and integration of complementary data sources, documentation of a flexible hierarchical classification system and variable naming conventions, and linking to neighborhoods and participant residences. We show results of this classification from a dataset of locations (over 77 million establishment locations) across the contiguous U.S. from 1990 to 2014. By incorporating complementary data sources, through manual spot checks in Google StreetView and word and name searches, we enhanced a basic classification using only standard industrial codes. Ultimately, providing these enhanced longitudinal data and supplying detailed methods for researchers to replicate our work promotes consistency, replicability, and new opportunities in neighborhood health research.
零售环境,如医疗机构、食品店和娱乐设施,可能与许多健康行为及结果相关。然而,关于如何收集、处理、汇总和关联这些数据的指导极少,这导致测量结果不一致或不完整,可能会引入错误分类偏差并限制现有研究的可重复性。我们描述了以下利用商业数据进行纵向社区健康研究的步骤:重新确定机构地址的地理位置、使用标准行业代码进行初步分类、进行系统检查以完善分类、纳入并整合补充数据源、记录灵活的分层分类系统和变量命名约定,以及与社区和参与者住所建立关联。我们展示了从1990年至2014年美国本土连续区域的位置数据集(超过7700万个机构位置)进行此分类的结果。通过纳入补充数据源,借助谷歌街景中的人工抽查以及文字和名称搜索,我们仅使用标准行业代码增强了基本分类。最终,提供这些增强的纵向数据并为研究人员提供详细方法以复制我们的工作,促进了社区健康研究的一致性、可重复性和新机遇。