Medical Research Council Centre of Epidemiology for Child health, University College London Institute of Child health, London, WC1N 1EH, UK.
Stat Med. 2012 Dec 10;31(28):3481-93. doi: 10.1002/sim.5508. Epub 2012 Jul 17.
Probabilistic record linkage techniques assign match weights to one or more potential matches for those individual records that cannot be assigned 'unequivocal matches' across data files. Existing methods select the single record having the maximum weight provided that this weight is higher than an assigned threshold. We argue that this procedure, which ignores all information from matches with lower weights and for some individuals assigns no match, is inefficient and may also lead to biases in subsequent analysis of the linked data. We propose that a multiple imputation framework be utilised for data that belong to records that cannot be matched unequivocally. In this way, the information from all potential matches is transferred through to the analysis stage. This procedure allows for the propagation of matching uncertainty through a full modelling process that preserves the data structure. For purposes of statistical modelling, results from a simulation example suggest that a full probabilistic record linkage is unnecessary and that standard multiple imputation will provide unbiased and efficient parameter estimates.
概率记录链接技术为那些在多个数据文件中无法被明确匹配的个体记录分配匹配权重。现有的方法选择具有最大权重的单个记录,前提是该权重高于指定的阈值。我们认为,这种方法忽略了所有权重较低的匹配信息,并且对于某些个体没有分配任何匹配,是低效的,并且可能会导致在后续对链接数据的分析中产生偏差。我们建议对于那些无法明确匹配的记录所属的数据使用多重插补框架。通过这种方式,所有潜在匹配的信息都可以传递到分析阶段。该方法允许通过完整的建模过程来传播匹配不确定性,同时保留数据结构。出于统计建模的目的,模拟示例的结果表明,完整的概率记录链接是不必要的,标准的多重插补将提供无偏且有效的参数估计。