Yang Duck-Hye, Bilaver Lucy Mackey, Hayes Oscar, Goerge Robert
Chapin Hall Center for Children at the University of Chicago, Chicago, Illinois, USA.
J Med Syst. 2004 Aug;28(4):361-70. doi: 10.1023/b:joms.0000032851.76239.e3.
This study examined the sources of error involved in geocoding, by systematically evaluating the strengths and weaknesses of three widely used tools for geocoding. We tested them against a random sample of addresses from a state administrative address master file and found considerable variation in identification of census block geocodes of addresses. This high variation was mainly attributable to differences in preprocessing of addresses before geocoding and the reference street data used for geocoding. Preprocessing includes not only parsing and standardizing, but also correcting addresses against the US Postal Service Zip+4 Database, the master mailing address database maintained and updated regularly by USPS.
本研究通过系统评估三种广泛使用的地理编码工具的优缺点,考察了地理编码中涉及的误差来源。我们对照一个州行政地址主文件中的随机地址样本对这些工具进行了测试,发现地址普查街区地理编码的识别存在相当大的差异。这种高度差异主要归因于地理编码前地址预处理的差异以及用于地理编码的参考街道数据的差异。预处理不仅包括解析和标准化,还包括对照美国邮政服务邮编+4数据库(由美国邮政定期维护和更新的主邮寄地址数据库)对地址进行校正。