Suppr超能文献

评估构成 Phenotype Map (PheMAP) 知识库的资源,以增强高通量表型分析。

Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping.

机构信息

Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, USA.

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

出版信息

J Am Med Inform Assoc. 2023 Feb 16;30(3):456-465. doi: 10.1093/jamia/ocac234.

Abstract

OBJECTIVE

A previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP.

MATERIALS AND METHODS

We compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency-inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center's BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement.

RESULTS

Jaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration.

CONCLUSIONS

Resources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.

摘要

目的

先前的 PheMAP 研究结合了独立的在线资源,从而能够使用电子健康记录(EHR)进行高通量表型分析(HTP)。然而,在线资源对疾病的描述质量各不相同,这可能会影响表型分析的性能。我们旨在评估基于单一资源的 PheMAP 的表型分析性能,并研究一种用于 HTP 的优化策略。

材料和方法

我们比较了每个资源如何通过使用 Jaccard 矩阵比较单一资源和原始 PheMAP 来使用术语频率-文档频率对顶级概念唯一标识符(CUI)进行排名。我们将每个资源的顶级概念与已建立的 Phenotype KnowledgeBase(PheKB)算法用于甲状腺功能减退症、2 型糖尿病(T2DM)和痴呆症的特征相关联。我们分别使用资源为范德比尔特大学医学中心的 BioVU DNA 生物库中的个体计算多个表型风险评分,并将表型分析性能与基于规则的 eMERGE 算法进行比较。最后,我们实施了一种基于 PheMAP 资源一致性的分类患者病例/对照状态的集成策略。

结果

Jaccard 相似性矩阵表明,构成基于单一资源的 PheMAP 的 CUI 的相似性有所不同。基于 MedlinePlus 和 MedicineNet 的单一资源的 PheMAP 表现优于其他资源,但仅包含 81.6%的总体疾病表型。我们提出了 PheMAP-Ensemble,它提供的平均准确率和精度高于基于单一资源的 PheMAP 的平均准确率和精度的总和。虽然提供了完整的表型覆盖范围,但 PheMAP-Ensemble 与原始迭代相比显著提高了表型分析的召回率。

结论

当单独实施时,构成 PheMAP 的资源会产生不同的表型分析性能。集成方法通过充分利用不同的资源来从 EHR 中捕获准确的表型数据,从而显著提高了 PheMAP 的质量。

相似文献

本文引用的文献

2
Quality Assessment of Online Resources for the Most Common Cancers.最常见癌症在线资源的质量评估
J Cancer Educ. 2023 Feb;38(1):34-41. doi: 10.1007/s13187-021-02075-2. Epub 2021 Aug 8.
6
nVenn: generalized, quasi-proportional Venn and Euler diagrams.nVenn:广义、准比例 Venn 和 Euler 图。
Bioinformatics. 2018 Jul 1;34(13):2322-2324. doi: 10.1093/bioinformatics/bty109.
9
Deep phenotyping: The details of disease.深度表型分析:疾病的细节
Nature. 2015 Nov 5;527(7576):S14-5. doi: 10.1038/527S14a.
10
UpSet: Visualization of Intersecting Sets.UpSet:相交集的可视化
IEEE Trans Vis Comput Graph. 2014 Dec;20(12):1983-92. doi: 10.1109/TVCG.2014.2346248.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验