Suppr超能文献

利用电子病历和生物医学文献,通过数据融合和协同过滤方法辅助罕见病诊断

Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches.

作者信息

Shen Feichen, Liu Sijia, Wang Yanshan, Wen Andrew, Wang Liwei, Liu Hongfang

机构信息

Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.

出版信息

JMIR Med Inform. 2018 Oct 10;6(4):e11301. doi: 10.2196/11301.

Abstract

BACKGROUND

In the United States, a rare disease is characterized as the one affecting no more than 200,000 patients at a certain period. Patients suffering from rare diseases are often either misdiagnosed or left undiagnosed, possibly due to insufficient knowledge or experience with the rare disease on the part of clinical practitioners. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature.

OBJECTIVE

This study aimed to leverage information contained in heterogeneous datasets to assist rare disease diagnosis. Phenotypic information of patients existed in EMRs and biomedical literature could be fully leveraged to speed up diagnosis of diseases.

METHODS

In our previous work, we advanced the use of a collaborative filtering recommendation system to support rare disease diagnostic decision making based on phenotypes derived solely from EMR data. However, the influence of using heterogeneous data with collaborative filtering was not discussed, which is an essential problem while facing large volumes of data from various resources. In this study, to further investigate the performance of collaborative filtering on heterogeneous datasets, we studied EMR data generated at Mayo Clinic as well as published article abstracts retrieved from the Semantic MEDLINE Database. Specifically, in this study, we designed different data fusion strategies from heterogeneous resources and integrated them with the collaborative filtering model.

RESULTS

We evaluated performance of the proposed system using characterizations derived from various combinations of EMR data and literature, as well as with sole EMR data. We extracted nearly 13 million EMRs from the patient cohort generated between 2010 and 2015 at Mayo Clinic and retrieved all article abstracts from the semistructured Semantic MEDLINE Database that were published till the end of 2016. We applied a collaborative filtering model and compared the performance generated by different metrics. Log likelihood ratio similarity combined with k-nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with area under the precision-recall curve (PRAUC) 0.475 (string match), 0.511 (systematized nomenclature of medicine [SNOMED] match), and 0.752 (Genetic and Rare Diseases Information Center [GARD] match). Log likelihood ratio similarity also performed the best with mean average precision 0.465 (string match), 0.5 (SNOMED match), and 0.749 (GARD match). Performance of rare disease prediction was also demonstrated by using the optimal algorithm. Macro-average F-measure for string, SNOMED, and GARD match were 0.32, 0.42, and 0.63, respectively.

CONCLUSIONS

This study demonstrated potential utilization of heterogeneous datasets in a collaborative filtering model to support rare disease diagnosis. In addition to phenotypic-based analysis, in the future, we plan to further resolve the heterogeneity issue and reduce miscommunication between EMR and literature by mining genotypic information to establish a comprehensive disease-phenotype-gene network for rare disease diagnosis.

摘要

背景

在美国,罕见病被定义为在特定时期内患病人数不超过20万的疾病。患有罕见病的患者常常被误诊或漏诊,这可能是由于临床医生对罕见病的了解或经验不足。随着电子医疗数据量呈指数级增长,大量关于数千种罕见病及其潜在相关诊断信息被埋没在电子病历(EMR)和医学文献中。

目的

本研究旨在利用异构数据集中包含的信息来辅助罕见病诊断。EMR和生物医学文献中存在的患者表型信息可以得到充分利用,以加速疾病诊断。

方法

在我们之前的工作中,我们推进了协同过滤推荐系统的应用,以支持基于仅从EMR数据得出的表型进行罕见病诊断决策。然而,未讨论使用异构数据与协同过滤的影响,而这是面对来自各种资源的大量数据时的一个关键问题。在本研究中,为了进一步研究协同过滤在异构数据集上的性能,我们研究了梅奥诊所生成的EMR数据以及从语义医学文献数据库检索到的已发表文章摘要。具体而言,在本研究中,我们从异构资源设计了不同的数据融合策略,并将其与协同过滤模型集成。

结果

我们使用从EMR数据和文献的各种组合得出的特征以及仅使用EMR数据来评估所提出系统的性能。我们从梅奥诊所2010年至2015年生成的患者队列中提取了近1300万份EMR,并从半结构化的语义医学文献数据库中检索了截至2016年底发表的所有文章摘要。我们应用了协同过滤模型,并比较了不同指标产生的性能。在异构数据集上,对数似然比相似度与k近邻相结合在患者推荐方面表现出最佳性能,精确召回率曲线下面积(PRAUC)分别为0.475(字符串匹配)、0.511(医学系统命名法[SNOMED]匹配)和0.752(遗传和罕见病信息中心[GARD]匹配)。对数似然比相似度在平均精度方面也表现最佳,分别为0.465(字符串匹配)、0.5(SNOMED匹配)和0.749(GARD匹配)。使用最优算法也证明了罕见病预测的性能。字符串、SNOMED和GARD匹配的宏平均F值分别为0.32、0.42和0.63。

结论

本研究证明了在协同过滤模型中利用异构数据集支持罕见病诊断的潜力。除了基于表型的分析外,未来我们计划通过挖掘基因型信息进一步解决异构问题并减少EMR与文献之间的沟通不畅,以建立用于罕见病诊断的综合疾病 - 表型 - 基因网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6162/6231873/8f24eb017dac/medinform_v6i4e11301_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验