统计监督元集成算法在病历关联中的应用。

Statistical supervised meta-ensemble algorithm for medical record linkage.

机构信息

School of Public Health and Community Medicine, UNSW Sydney, Australia; School of Electrical and Data Engineering, Faculty of Electrical and Information Technology, University of Technology Sydney, Australia.

School of Public Health and Community Medicine, UNSW Sydney, Australia; WHO Collaborating Centre for eHealth, UNSW Sydney, Australia.

出版信息

J Biomed Inform. 2019 Jul;95:103220. doi: 10.1016/j.jbi.2019.103220. Epub 2019 May 31.

DOI:10.1016/j.jbi.2019.103220

PMID:31158554

Abstract

Identifying unique patients across multiple care facilities or services is a major challenge in providing continuous care and undertaking health research. Identifying and linking patients without compromising privacy and security is an emerging issue in the big data era. The large quantity and complexity of the patient data emphasize the need for effective linkage methods that are both scalable and accurate. In this study, we aim to develop and evaluate an ensemble classification method using the three most typically used supervised learning methods, namely support vector machines, logistic regression and standard feed-forward neural networks, to link records that belong to the same patient across multiple service locations. Our ensemble method is the combination of bagging and stacking. Each base learner's critical hyperparameters were selected through grid search technique. Two synthetic datasets were used in this study namely FEBRL and ePBRN. ePBRN linkage dataset was based on linkage errors noticed in the Australian primary care setting. The overall linkage performance was determined by assessing the blocking performance and classification performance. Our ensemble method outperformed the base learners in all evaluation metrics on one dataset. More specifically, the precision, which is average of individual precision scores in case of base learners increased from 90.70% to 94.85% in FEBRL, and from 62.17% to 99.28% in ePBRN. Similarly, the F-score increased from 94.92% to 98.18% in FEBRL, and from 72.99% to 91.72% in ePBRN. Our experiments suggest that we can significantly improve the linkage performance of individual algorithms by employing ensemble strategies.

摘要

在提供连续护理和进行健康研究方面，识别多个护理机构或服务中的独特患者是一个主要挑战。在大数据时代，在不损害隐私和安全的情况下识别和链接患者是一个新兴问题。患者数据的大量和复杂性强调了需要有效的链接方法，这些方法既具有可扩展性又准确。在这项研究中，我们旨在开发和评估一种集成分类方法，该方法使用三种最常用的监督学习方法，即支持向量机、逻辑回归和标准前馈神经网络，以链接属于多个服务位置的同一患者的记录。我们的集成方法是袋装和堆叠的组合。每个基础学习者的关键超参数都是通过网格搜索技术选择的。本研究使用了两个合成数据集，即 FEBRL 和 ePBRN。ePBRN 链接数据集基于澳大利亚初级保健环境中发现的链接错误。整体链接性能通过评估阻塞性能和分类性能来确定。在一个数据集上，我们的集成方法在所有评估指标上都优于基础学习者。具体来说，在 FEBRL 中，精度（即基础学习者的个体精度得分的平均值）从 90.70%提高到 94.85%，在 ePBRN 中从 62.17%提高到 99.28%。类似地，在 FEBRL 中，F 分数从 94.92%提高到 98.18%，在 ePBRN 中从 72.99%提高到 91.72%。我们的实验表明，通过采用集成策略，我们可以显著提高单个算法的链接性能。

相似文献

Statistical supervised meta-ensemble algorithm for medical record linkage.统计监督元集成算法在病历关联中的应用。

J Biomed Inform. 2019 Jul;95:103220. doi: 10.1016/j.jbi.2019.103220. Epub 2019 May 31.

Derivation and validation of a machine learning record linkage algorithm between emergency medical services and the emergency department.基于机器学习的急诊医疗服务与急诊科间记录链接算法的推导与验证。

J Am Med Inform Assoc. 2020 Jan 1;27(1):147-153. doi: 10.1093/jamia/ocz176.

Comparing record linkage software programs and algorithms using real-world data.使用真实世界的数据比较记录链接软件程序和算法。

PLoS One. 2019 Sep 24;14(9):e0221459. doi: 10.1371/journal.pone.0221459. eCollection 2019.

CIDACS-RL: a novel indexing search and scoring-based record linkage system for huge datasets with high accuracy and scalability.CIDACS-RL：一种基于索引搜索和评分的新型记录链接系统，用于处理具有高精度和可扩展性的大型数据集。

BMC Med Inform Decis Mak. 2020 Nov 9;20(1):289. doi: 10.1186/s12911-020-01285-w.

Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort.使用机器学习链接癌症登记处的电子健康记录：在链接质量和人工努力之间的权衡。

Int J Med Inform. 2024 May;185:105387. doi: 10.1016/j.ijmedinf.2024.105387. Epub 2024 Feb 28.

Sociodemographic differences in linkage error: an examination of four large-scale datasets.连锁错误中的社会人口学差异：对四个大规模数据集的考察

BMC Health Serv Res. 2018 Sep 3;18(1):678. doi: 10.1186/s12913-018-3495-x.

Validation of mother-infant linkage using Medicaid Case ID variable within the Medicaid Analytic eXtract (MAX) database.在医疗补助分析提取物（MAX）数据库中使用医疗补助病例ID变量验证母婴关联。

Pharmacoepidemiol Drug Saf. 2019 Sep;28(9):1222-1230. doi: 10.1002/pds.4843. Epub 2019 Jul 9.

Optimization of the Mainzelliste software for fast privacy-preserving record linkage.优化 Mainzelliste 软件以实现快速的隐私保护记录链接。

J Transl Med. 2021 Jan 15;19(1):33. doi: 10.1186/s12967-020-02678-1.

Estimating parameters for probabilistic linkage of privacy-preserved datasets.估算隐私保护数据集概率关联的参数。

BMC Med Res Methodol. 2017 Jul 10;17(1):95. doi: 10.1186/s12874-017-0370-0.

Evaluation of a Binary Semi-supervised Classification Technique for Probabilistic Record Linkage.用于概率性记录链接的二元半监督分类技术评估

Methods Inf Med. 2016;55(2):136-43. doi: 10.3414/ME14-01-0087. Epub 2015 Apr 20.

引用本文的文献

Statin Prescription Patterns and Associations with Subclinical Inflammation.他汀类药物的处方模式与亚临床炎症的相关性研究。

Medicina (Kaunas). 2022 Aug 14;58(8):1096. doi: 10.3390/medicina58081096.

Virtual Learning Environment of the Brazilian Health System (AVASUS): Efficiency of Results, Impacts, and Contributions.巴西卫生系统虚拟学习环境（AVASUS）：成果、影响及贡献的效率

Front Med (Lausanne). 2022 Jun 2;9:896208. doi: 10.3389/fmed.2022.896208. eCollection 2022.

Improving the Accuracy of Ensemble Machine Learning Classification Models Using a Novel Bit-Fusion Algorithm for Healthcare AI Systems.利用一种新颖的位融合算法提高医疗 AI 系统中集成机器学习分类模型的准确性。

Front Public Health. 2022 May 4;10:858282. doi: 10.3389/fpubh.2022.858282. eCollection 2022.

Moving with the Times: The Health Science Alliance (HSA) Biobank, Pathway to Sustainability.与时俱进：健康科学联盟（HSA）生物样本库，可持续发展之路。

Biomark Insights. 2021 Mar 27;16:11772719211005745. doi: 10.1177/11772719211005745. eCollection 2021.

The OpenDeID corpus for patient de-identification.OpenDeID 患者去识别语料库。

Sci Rep. 2021 Oct 7;11(1):19973. doi: 10.1038/s41598-021-99554-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

统计监督元集成算法在病历关联中的应用。

Statistical supervised meta-ensemble algorithm for medical record linkage.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献