Department of Surgery and the Alvin J. Siteman Cancer Center, School of Medicine, Washington University, St. Louis, MO, USA.
J Urban Health. 2010 Jul;87(4):713-25. doi: 10.1007/s11524-010-9458-0.
Growing evidence supports a relationship between neighborhood-level characteristics and important health outcomes. One source of neighborhood data includes commercial databases integrated with geographic information systems to measure availability of certain types of businesses or destinations that may have either favorable or adverse effects on health outcomes; however, the quality of these data sources is generally unknown. This study assessed the concordance of two commercial databases for ascertaining the presence, locations, and characteristics of businesses. Businesses in the St. Louis, Missouri area were selected based on their four-digit Standard Industrial Classification (SIC) codes and classified into 14 business categories. Business listings in the two commercial databases were matched by standardized business name within specified distances. Concordance and coverage measures were calculated using capture-recapture methods for all businesses and by business type, with further stratification by census-tract-level population density, percent below poverty, and racial composition. For matched listings, distance between listings and agreement in four-digit SIC code, sales volume, and employee size were calculated. Overall, the percent agreement was 32% between the databases. Concordance and coverage estimates were lowest for health-care facilities and leisure/entertainment businesses; highest for popular walking destinations, eating places, and alcohol/tobacco establishments; and varied somewhat by population density. The mean distance (SD) between matched listings was 108.2 (179.0) m with varying levels of agreement in four-digit SIC (percent agreement = 84.6%), employee size (weighted kappa = 0.63), and sales volume (weighted kappa = 0.04). Researchers should cautiously interpret findings when using these commercial databases to yield measures of the neighborhood environment.
越来越多的证据表明,社区层面的特征与重要的健康结果之间存在关联。社区数据的一个来源包括商业数据库,这些数据库与地理信息系统集成,用于衡量某些类型的企业或目的地的可用性,这些企业或目的地可能对健康结果产生有利或不利的影响;然而,这些数据源的质量通常是未知的。本研究评估了两种商业数据库在确定企业的存在、位置和特征方面的一致性。密苏里州圣路易斯地区的企业是根据其四位标准工业分类(SIC)代码选择的,并分为 14 种企业类型。在两个商业数据库中,通过标准化的企业名称在指定的距离内进行匹配。使用捕获-再捕获方法计算所有企业以及按企业类型的一致性和覆盖范围指标,并按普查区人口密度、贫困率和种族构成进一步分层。对于匹配的列表,计算列表之间的距离、四位 SIC 代码的一致性、销售量和员工规模。总体而言,两个数据库之间的百分比一致性为 32%。一致性和覆盖范围估计值对于医疗保健设施和休闲/娱乐企业最低;对于受欢迎的步行目的地、餐饮场所和酒精/烟草企业最高;并且根据人口密度的不同而有所不同。匹配列表之间的平均距离(SD)为 108.2(179.0)m,四位 SIC 代码的一致性程度不同(百分比一致性=84.6%)、员工规模(加权kappa=0.63)和销售量(加权kappa=0.04)。研究人员在使用这些商业数据库来衡量邻里环境的措施时,应谨慎解释研究结果。