Centre for Data Linkage, Curtin University of Technology, Western Australia, Australia.
J Biomed Inform. 2012 Feb;45(1):165-72. doi: 10.1016/j.jbi.2011.10.006. Epub 2011 Oct 30.
There has been substantial growth in Data Linkage (DL) activities in recent years. This reflects growth in both the demand for, and the supply of, linked or linkable data. Increased utilisation of DL "services" has brought with it increased need for impartial information about the suitability and performance capabilities of DL software programs and packages. Although evaluations of DL software exist; most have been restricted to the comparison of two or three packages. Evaluations of a large number of packages are rare because of the time and resource burden placed on the evaluators and the need for a suitable "gold standard" evaluation dataset. In this paper we present an evaluation methodology that overcomes a number of these difficulties. Our approach involves the generation and use of representative synthetic data; the execution of a series of linkages using a pre-defined linkage strategy; and the use of standard linkage quality metrics to assess performance. The methodology is both transparent and transportable, producing genuinely comparable results. The methodology was used by the Centre for Data Linkage (CDL) at Curtin University in an evaluation of ten DL software packages. It is also being used to evaluate larger linkage systems (not just packages). The methodology provides a unique opportunity to benchmark the quality of linkages in different operational environments.
近年来,数据链接(Data Linkage,DL)活动有了实质性的增长。这反映了对链接或可链接数据的需求和供应都有所增加。对 DL“服务”的利用增加,对 DL 软件程序和套件的适用性和性能能力的公正信息的需求也随之增加。虽然存在对 DL 软件的评估;但大多数评估仅限于比较两个或三个套件。由于评估者面临的时间和资源负担以及对合适的“黄金标准”评估数据集的需求,对大量套件进行评估是罕见的。在本文中,我们提出了一种评估方法,可以克服许多这些困难。我们的方法涉及生成和使用代表性的合成数据;使用预定义的链接策略执行一系列链接;并使用标准链接质量指标来评估性能。该方法既透明又可移植,可以产生真正可比的结果。该方法已被科廷大学数据链接中心(CDL)用于评估十个 DL 软件包。它也被用于评估更大的链接系统(不仅仅是软件包)。该方法为在不同操作环境下基准测试链接的质量提供了独特的机会。