Suppr超能文献

提高巴西里约热内卢结构化地址的地理编码匹配率。

Improving geocoding matching rates of structured addresses in Rio de Janeiro, Brazil.

机构信息

Instituto de Medicina Social, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brasil.

Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador, Brasil.

出版信息

Cad Saude Publica. 2021 Jul 28;37(7):e00039321. doi: 10.1590/0102-311X00039321. eCollection 2021.

Abstract

Strategies for improving geocoded data often rely on interactive manual processes that can be time-consuming and impractical for large-scale projects. In this study, we evaluated different automated strategies for improving address quality and geocoding matching rates using a large dataset of addresses from death records in Rio de Janeiro, Brazil. Mortality data included 132,863 records with address information in a structured format. We performed regular expressions and dictionary-based methods for address standardization and enrichment. All records were linked by their postal code or street name to the Brazilian National Address Directory (DNE) obtained from Brazil's Postal Service. Residential addresses were geocoded using Google Maps. Records with address data validated down to the street level and location type returned as rooftop, range interpolated, or geometric center were considered a geocoding match. The overall performance was assessed by manually reviewing a sample of addresses. Out of the original 132,863 records, 85.7% (n = 113,876) were geocoded and validated, out of which 83.8% were matched as rooftop (high accuracy). Overall sensitivity and specificity were 87% (95%CI: 86-88) and 98% (95%CI: 96-99), respectively. Our results indicate that address quality and geocoding completeness can be reliably improved with an automated geocoding process. R scripts and instructions to reproduce all the analyses are available at https://github.com/reprotc/geocoding.

摘要

改进地理编码数据的策略通常依赖于交互式手动处理,对于大规模项目来说可能既耗时又不切实际。在这项研究中,我们评估了不同的自动化策略,以提高使用巴西里约热内卢死亡记录中大型地址数据集的地址质量和地理编码匹配率。死亡率数据包括 132863 条记录,其中包含结构化格式的地址信息。我们对地址进行了标准化和丰富化,使用了正则表达式和基于字典的方法。所有记录都通过邮政编码或街道名称与巴西邮政局获得的巴西国家地址目录 (DNE) 相关联。住宅地址使用谷歌地图进行地理编码。地址数据验证到街道级别和位置类型(返回屋顶、范围插值或几何中心)的记录被视为地理编码匹配。通过手动审查地址样本评估整体性能。在最初的 132863 条记录中,85.7%(n=113876)被地理编码和验证,其中 83.8%被匹配为屋顶(高精度)。总体灵敏度和特异性分别为 87%(95%CI:86-88)和 98%(95%CI:96-99)。我们的结果表明,地理编码过程的自动化可以可靠地提高地址质量和地理编码的完整性。可在 https://github.com/reprotc/geocoding 上获得用于重现所有分析的 R 脚本和说明。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验