Fornari Carla, Madotto Fabiana, Demaria Moreno, Romanelli Anna, Pepe Pasquale, Raciti Mauro, Tancioni Valeria, Chini Francesco, Trerotoli Paolo, Bartolomeo Nicola, Serio Gabriella, Cesana Giancarlo, Corrao Giovanni
Centro di studio e ricerca sulla patologia cronico-degenerativa negli ambienti di lavoro, Dipartimento di medicina clinica e prevenzione, Facoltà di medicina e chirurgia, Università degli studi di Milano Bicocca, Italy.
Epidemiol Prev. 2008 May-Jun;32(3 Suppl):79-88.
To compare record linkage (RL) procedures adopted in several Italian settings and a standard probabilistic RL procedure for matching data from electronic health care databases.
Two health care archives are matched: the hospital discharges (HD) archive and the population registry of four Italian areas. Exact deterministic, stepwise deterministic techniques and a standard probabilistic RL procedure are applied to match HD for acute myocardial infarction (AMI) and diabetes mellitus. Sensitivity and specificity for RL procedures are estimated after manual review. Age and gender standardized annual hospitalization rates for AMI and diabetes are computed using different RL procedures and compared.
Municipalities of Pisa and Roma, and Regions of Puglia and Piemonte.
Residents in the considered areas on 31 December 2003 and corresponding episodes of hospitalization in the same areas during 2004.
Measures of accuracy of RL procedures to match health care administrative databases.
Data quality varies among archives and affects the decision rule of the probabilistic procedure. A unique decision rule was therefore adopted by means of choosing a positive predictive value of at least 98% for all the considered areas. The number of matched pairs identified with the probabilistic procedure is on average more then 11% greater than the number identified with the deterministic procedure. Sensitivity of probabilistic RL is similar or greater than that of other procedures. Differences between annual standardized hospitalization rates computed with stepwise deterministic RL and the standard probabilistic RL procedure vary among areas.
Exact deterministic RL works well when unique identifiers and high quality data are available. The probabilistic procedure here proposed works as well as semi-deterministic RL when the latter implements a quality control of data or a manual review of final results. Otherwise, deterministic or semi-deterministic procedures imply classification errors of unknown size and direction.
比较意大利多个地区采用的记录链接(RL)程序以及用于匹配电子医疗数据库数据的标准概率性RL程序。
匹配两个医疗档案:医院出院(HD)档案和意大利四个地区的人口登记册。应用精确确定性、逐步确定性技术以及标准概率性RL程序来匹配急性心肌梗死(AMI)和糖尿病的HD数据。在人工审核后估计RL程序的敏感性和特异性。使用不同的RL程序计算AMI和糖尿病的年龄和性别标准化年度住院率并进行比较。
比萨市和罗马市以及普利亚大区和皮埃蒙特大区。
2003年12月31日各相关地区的居民以及2004年同一地区相应的住院病例。
匹配医疗管理数据库的RL程序的准确性指标。
档案之间的数据质量各不相同,并且会影响概率性程序的决策规则。因此,通过为所有相关地区选择至少98%的阳性预测值,采用了一个统一的决策规则。概率性程序识别出的匹配对数量平均比确定性程序识别出的数量多11%以上。概率性RL的敏感性与其他程序相似或更高。逐步确定性RL和标准概率性RL程序计算出的年度标准化住院率之间的差异因地区而异。
当有唯一标识符和高质量数据时,精确确定性RL效果良好。当半确定性RL实施数据质量控制或对最终结果进行人工审核时,这里提出的概率性程序与半确定性RL效果相当。否则,确定性或半确定性程序会导致大小和方向未知的分类错误。