Méray Nora, Reitsma Johannes B, Ravelli Anita C J, Bonsel Gouke J
Academic Medical Centrum (AMC), Department of Medical Informatics, Amsterdam, The Netherlands.
J Clin Epidemiol. 2007 Sep;60(9):883-91. doi: 10.1016/j.jclinepi.2006.11.021. Epub 2007 May 17.
To describe the technical approach and subsequent validation of the probabilistic linkage of the three anonymous, population-based Dutch Perinatal Registries (LVR1 of midwives, LVR2 of obstetricians, and LNR of pediatricians/neonatologists). These registries do not share a unique identification number.
A combination of probabilistic and deterministic record linkage techniques were applied using information about the mother, delivery, and child(ren) to link three known registries. Rewards for agreement and penalties for disagreement between corresponding variables were calculated based on the observed patterns of agreement and disagreements using maximum likelihood estimation. Special measures were developed to overcome linking difficulties in twins. A subsample of linked and nonlinked pairs was validated.
Independent validation confirmed that the procedure successfully linked the three Dutch perinatal registries despite nontrivial error rates in the linking variables.
Probabilistic linkage techniques allowed the creation of a high-quality linked database from crude registry data. The developed procedures are generally applicable in linkage of health data with partially identifying information. They provide useful source date even if cohorts are only partly overlapping and if within the cohort, multiple entities and twins exist.
描述将荷兰三个基于人群的匿名围产期登记处(助产士的LVR1、产科医生的LVR2以及儿科医生/新生儿科医生的LNR)进行概率性关联的技术方法及后续验证。这些登记处没有共享唯一识别码。
运用概率性和确定性记录关联技术相结合的方法,利用关于母亲、分娩和子女的信息来关联三个已知登记处。基于使用最大似然估计观察到的一致和不一致模式,计算对应变量一致的奖励和不一致的惩罚。制定了特殊措施来克服双胞胎关联的困难。对已关联和未关联对子的一个子样本进行了验证。
独立验证证实,尽管关联变量存在显著错误率,但该程序成功地将荷兰三个围产期登记处关联起来。
概率性关联技术能够从原始登记数据创建高质量的关联数据库。所开发的程序通常适用于将健康数据与部分识别信息进行关联。即使队列只是部分重叠,并且队列中存在多个实体和双胞胎,它们也能提供有用的源数据。