School of Medicine, Vanderbilt University, Nashville, Tennessee, United States.
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States.
Appl Clin Inform. 2023 Oct;14(5):833-842. doi: 10.1055/a-2148-6414. Epub 2023 Aug 4.
Geocoding, the process of converting addresses into precise geographic coordinates, allows researchers and health systems to obtain neighborhood-level estimates of social determinants of health. This information supports opportunities to personalize care and interventions for individual patients based on the environments where they live. We developed an integrated offline geocoding pipeline to streamline the process of obtaining address-based variables, which can be integrated into existing data processing pipelines.
POINT is a web-based, containerized, application for geocoding addresses that can be deployed offline and made available to multiple users across an organization. Our application supports use through both a graphical user interface and application programming interface to query geographic variables, by census tract, without exposing sensitive patient data. We evaluated our application's performance using two datasets: one consisting of 1 million nationally representative addresses sampled from Open Addresses, and the other consisting of 3,096 previously geocoded patient addresses.
A total of 99.4 and 99.8% of addresses in the Open Addresses and patient addresses datasets, respectively, were geocoded successfully. Census tract assignment was concordant with reference in greater than 90% of addresses for both datasets. Among successful geocodes, median (interquartile range) distances from reference coordinates were 52.5 (26.5-119.4) and 14.5 (10.9-24.6) m for the two datasets.
POINT successfully geocodes more addresses and yields similar accuracy to existing solutions, including the U.S. Census Bureau's official geocoder. Addresses are considered protected health information and cannot be shared with common online geocoding services. POINT is an offline solution that enables scalability to multiple users and integrates downstream mapping to neighborhood-level variables with a pipeline that allows users to incorporate additional datasets as they become available. As health systems and researchers continue to explore and improve health equity, it is essential to quickly and accurately obtain neighborhood variables in a Health Insurance Portability and Accountability Act (HIPAA)-compliant way.
地理编码是将地址转换为精确地理坐标的过程,它使研究人员和卫生系统能够获得基于邻里的健康决定因素的估计值。这些信息支持根据患者居住的环境为个体患者提供个性化护理和干预的机会。我们开发了一个集成的离线地理编码管道,以简化获取基于地址的变量的过程,这些变量可以集成到现有的数据处理管道中。
POINT 是一个基于网络的、容器化的地址地理编码应用程序,可以离线部署,并可供组织内的多个用户使用。我们的应用程序支持通过图形用户界面和应用程序编程接口查询地理变量,按普查区进行查询,而不会暴露敏感的患者数据。我们使用两个数据集来评估我们的应用程序的性能:一个包含从 Open Addresses 中抽取的 100 万个具有全国代表性的地址,另一个包含 3096 个先前地理编码的患者地址。
Open Addresses 和患者地址数据集的地址分别有 99.4%和 99.8%被成功地理编码。对于两个数据集,普查区的分配与参考值的一致性均大于 90%。在成功地理编码的地址中,参考坐标的中位数(四分位距)距离分别为 52.5(26.5-119.4)和 14.5(10.9-24.6)米。
POINT 成功地理编码了更多的地址,并且与包括美国人口普查局官方地理编码器在内的现有解决方案具有相似的准确性。地址被视为受保护的健康信息,不能与常见的在线地理编码服务共享。POINT 是一个离线解决方案,可实现多个用户的可扩展性,并将下游映射集成到邻里级别的变量中,使用户可以在可用时将其他数据集纳入其中。随着卫生系统和研究人员继续探索和改善健康公平,以符合《健康保险流通与责任法案》(HIPAA)的方式快速准确地获取邻里变量至关重要。