Department of Medical Informatics, Academic Medical Center, University of Amsterdam, 1100 DE Amsterdam, The Netherlands.
J Clin Epidemiol. 2011 May;64(5):565-72. doi: 10.1016/j.jclinepi.2010.05.008. Epub 2010 Oct 16.
To gain insight into the performance of deterministic record linkage (DRL) vs. probabilistic record linkage (PRL) strategies under different conditions by varying the frequency of registration errors and the amount of discriminating power.
A simulation study in which data characteristics were varied to create a range of realistic linkage scenarios. For each scenario, we compared the number of misclassifications (number of false nonlinks and false links) made by the different linking strategies: deterministic full, deterministic N-1, and probabilistic.
The full deterministic strategy produced the lowest number of false positive links but at the expense of missing considerable numbers of matches dependent on the error rate of the linking variables. The probabilistic strategy outperformed the deterministic strategy (full or N-1) across all scenarios. A deterministic strategy can match the performance of a probabilistic approach providing that the decision about which disagreements should be tolerated is made correctly. This requires a priori knowledge about the quality of all linking variables, whereas this information is inherently generated by a probabilistic strategy.
PRL is more flexible and provides data about the quality of the linkage process that in turn can minimize the degree of linking errors, given the data provided.
通过改变注册错误的频率和辨别能力的大小,深入了解确定性记录链接(DRL)与概率性记录链接(PRL)策略在不同条件下的表现。
这是一项模拟研究,通过改变数据特征来创建一系列现实的链接场景。对于每个场景,我们比较了不同链接策略(确定性完全、确定性 N-1 和概率性)所产生的错误分类数量(错误的非链接和错误的链接数量):确定性完全、确定性 N-1 和概率性。
完全确定性策略产生的假阳性链接数量最少,但代价是根据链接变量的错误率错过了相当数量的匹配。在所有场景中,概率性策略都优于确定性策略(完全或 N-1)。只要正确做出关于应容忍哪些分歧的决策,确定性策略就可以匹配概率方法的性能。这需要事先了解所有链接变量的质量,而这一信息是由概率性策略固有地生成的。
PRL 更灵活,并提供有关链接过程质量的数据,从而可以根据提供的数据最大限度地减少链接错误的程度。