Arellano M G, Weber G I
Advanced Linkage Technologies of America, Inc., USA.
J Healthc Inf Manag. 1998 Fall;12(3):43-52.
Historically, the health information systems community has viewed linking personal records as a mundane task. The oversimplified view that routine database manipulation can accurately identify multiple records for a single individual is erroneous, an assumption based on a misperception of the quality of the underlying data. Such data have been adversely affected by the evolution of individual facility patient indexes from multiple systems and the results of backload procedures, and the lack of focus on the need for data integrity by users of the automated systems. Much of the random, invalid data we identify on a daily basis is directly associated with the need for system users to place data in the patient record while they face the situation of having no obvious data field in which to place them. Combined with an underlying lack of standards for the collection of personal identification information, this results in pure chaos when reviewing an MPI file containing a million records at the start of a linkage evaluation project. We have documented the considerable effort that must therefore be made in standardizing the MPI files using stringent analytical procedures and applying common edit routines before commencing record linkage. This preprocessing effort must then be supplemented with sophisticated matching procedures that can handle the dual challenge of minimizing false negatives (the failure to identify true linkages) and false positives (the incorrect linking of records that do not represent the same person). The identification of pairs of linked records does not, however, complete an EPI loading. Because it is fairly common for a multiple facility linkage evaluation to identify more than two medical record numbers for the same patient, and the primary goal of an EPI is to assign a unique identifier for the patient which will link that patient's multiple files, it becomes necessary to develop a means of readily associating three or more records for the same patient. One approach we have used with great success is to assign a common, sequential identification number to all linked medical record numbers for the same patient regardless of facility. The assignment of linkage identification numbers is computer-intensive and is generally accomplished with a highly iterative process. Both system memory and hard disk resources are fully tested as the number of good linkages in an overlap evaluation reaches the half-million mark or greater. Because the primary linkage analysis goal is to develop linkages on pairs of records, with confidence levels based on the comparison of information for those two records, thresholds must be set to decide which linkages should be accepted as true without any human evaluation. If the threshold is set too low, the defined linkage groups may incorrectly join the medical record numbers for different persons. But if the threshold is set too high, there will be undesired duplication of persons in the enterprise system. As in the identification of the underlying linkage pairs, the development of a confidence measure greatly facilitates the assignment of the unique identification numbers needed in the EPI implementation.
从历史上看,健康信息系统界一直将链接个人记录视为一项普通任务。那种认为常规数据库操作能够准确识别同一个人的多条记录的过于简单的观点是错误的,这一假设基于对基础数据质量的误解。这些数据受到了多个系统中各个机构患者索引的演变、回载程序的结果以及自动化系统用户对数据完整性需求缺乏关注的不利影响。我们日常识别出的许多随机、无效数据直接与系统用户在面对没有明显数据字段可用于存放数据的情况时仍需将数据录入患者记录的需求相关。再加上个人身份信息收集缺乏基本标准,这就导致在链接评估项目开始时审查包含一百万条记录的主患者索引(MPI)文件时完全陷入混乱。我们记录了因此在开始记录链接之前必须付出的巨大努力,即使用严格的分析程序并应用通用编辑例程来标准化MPI文件。然后,这种预处理工作必须辅以复杂的匹配程序,该程序要能应对尽量减少假阴性(未能识别真正的链接)和假阳性(错误地链接不代表同一个人的记录)这一双重挑战。然而,识别链接记录对并不意味着完成了电子病历整合(EPI)加载。因为在多机构链接评估中识别出同一患者有两个以上病历号是相当常见的情况,而EPI的主要目标是为患者分配一个唯一标识符,该标识符将链接该患者的多个文件,所以有必要开发一种方法来轻松关联同一患者的三条或更多记录。我们成功使用的一种方法是,无论机构如何,为同一患者的所有链接病历号分配一个通用的顺序识别号。链接识别号的分配计算量很大,通常通过高度迭代的过程来完成。当重叠评估中的有效链接数达到五十万条或更多时,系统内存和硬盘资源都会受到充分考验。由于主要的链接分析目标是在记录对之间建立链接,并根据这两条记录的信息比较确定置信水平,所以必须设置阈值来决定哪些链接应被视为真实链接而无需人工评估。如果阈值设置得过低,定义的链接组可能会错误地将不同人的病历号连接在一起。但如果阈值设置得过高,企业系统中就会出现不必要的人员重复。与识别基础链接对一样,开发一种置信度度量极大地有助于在EPI实施过程中分配所需的唯一识别号。