Department of Mathematics and Statistics, Slippery Rock University, Slippery Rock, PA 16057, USA.
National Opinion Research Center, Boston, MA 02114, USA.
Int J Environ Res Public Health. 2020 Sep 22;17(18):6937. doi: 10.3390/ijerph17186937.
Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter-based methods, machine learning methods, and Bayesian methods. Practical considerations, such as data standardization and privacy concerns, are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters, such as 9/11.
自二战后创立以来,记录链接科学呈指数级增长,并被广泛应用于工业、政府和学术机构。依赖记录链接的学术领域多种多样,从历史学到公共卫生学再到人口统计学。在本文中,我们介绍了不同类型的数据链接,并为它们的发展提供了历史背景。然后,我们介绍了概率记录链接的三种基础模型:费雷利-桑特(Fellegi-Sunter)方法、机器学习方法和贝叶斯方法。接下来讨论了实际考虑因素,例如数据标准化和隐私问题。最后,为正在开发或维护记录链接计划的组织提供了建议,重点是那些正在衡量 9/11 等灾难长期并发症的组织。