评估构成 Phenotype Map (PheMAP) 知识库的资源，以增强高通量表型分析。

Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping.

机构信息

Department of Biomedical Engineering, Vanderbilt University, Nashville, Tennessee, USA.

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

出版信息

J Am Med Inform Assoc. 2023 Feb 16;30(3):456-465. doi: 10.1093/jamia/ocac234.

DOI:10.1093/jamia/ocac234

PMID:36451277

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9933070/

Abstract

OBJECTIVE

A previous study, PheMAP, combined independent, online resources to enable high-throughput phenotyping (HTP) using electronic health records (EHRs). However, online resources offer distinct quality descriptions of diseases which may affect phenotyping performance. We aimed to evaluate the phenotyping performance of single resource-based PheMAPs and investigate an optimized strategy for HTP.

MATERIALS AND METHODS

We compared how each resource produced top-ranked concept unique identifiers (CUIs) by term frequency-inverse document frequency with Jaccard matrices comparing single resources and the original PheMAP. We correlated top-ranked concepts from each resource to features used in established Phenotype KnowledgeBase (PheKB) algorithms for hypothyroidism, type II diabetes mellitus (T2DM), and dementias. Using resources separately, we calculated multiple phenotype risk scores for individuals from Vanderbilt University Medical Center's BioVU DNA Biobank and compared phenotyping performance against rule-based eMERGE algorithms. Lastly, we implemented an ensemble strategy which classified patient case/control status based upon PheMAP resource agreement.

RESULTS

Jaccard similarity matrices indicate that the similarity of CUIs comprising single resource-based PheMAPs varies. Single resource-based PheMAPs generated from MedlinePlus and MedicineNet outperformed others but only encompass 81.6% of overall disease phenotypes. We propose the PheMAP-Ensemble which provides higher average accuracy and precision than the combined average accuracy and precision of single resource-based PheMAPs. While offering complete phenotype coverage, PheMAP-Ensemble significantly increases phenotyping recall compared to the original iteration.

CONCLUSIONS

Resources comprising the PheMAP produce different phenotyping performance when implemented individually. The ensemble method significantly improves the quality of PheMAP by fully utilizing dissimilar resources to capture accurate phenotyping data from EHRs.

摘要

目的

先前的 PheMAP 研究结合了独立的在线资源，从而能够使用电子健康记录（EHR）进行高通量表型分析（HTP）。然而，在线资源对疾病的描述质量各不相同，这可能会影响表型分析的性能。我们旨在评估基于单一资源的 PheMAP 的表型分析性能，并研究一种用于 HTP 的优化策略。

材料和方法

我们比较了每个资源如何通过使用 Jaccard 矩阵比较单一资源和原始 PheMAP 来使用术语频率-文档频率对顶级概念唯一标识符（CUI）进行排名。我们将每个资源的顶级概念与已建立的 Phenotype KnowledgeBase（PheKB）算法用于甲状腺功能减退症、2 型糖尿病（T2DM）和痴呆症的特征相关联。我们分别使用资源为范德比尔特大学医学中心的 BioVU DNA 生物库中的个体计算多个表型风险评分，并将表型分析性能与基于规则的 eMERGE 算法进行比较。最后，我们实施了一种基于 PheMAP 资源一致性的分类患者病例/对照状态的集成策略。

结果

Jaccard 相似性矩阵表明，构成基于单一资源的 PheMAP 的 CUI 的相似性有所不同。基于 MedlinePlus 和 MedicineNet 的单一资源的 PheMAP 表现优于其他资源，但仅包含 81.6%的总体疾病表型。我们提出了 PheMAP-Ensemble，它提供的平均准确率和精度高于基于单一资源的 PheMAP 的平均准确率和精度的总和。虽然提供了完整的表型覆盖范围，但 PheMAP-Ensemble 与原始迭代相比显著提高了表型分析的召回率。

结论

当单独实施时，构成 PheMAP 的资源会产生不同的表型分析性能。集成方法通过充分利用不同的资源来从 EHR 中捕获准确的表型数据，从而显著提高了 PheMAP 的质量。

相似文献

Evaluating resources composing the PheMAP knowledge base to enhance high-throughput phenotyping.评估构成 Phenotype Map (PheMAP) 知识库的资源，以增强高通量表型分析。

J Am Med Inform Assoc. 2023 Feb 16;30(3):456-465. doi: 10.1093/jamia/ocac234.

PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records.PheMap：一个用于电子健康记录中高通量表型分析的多资源知识库。

J Am Med Inform Assoc. 2020 Nov 1;27(11):1675-1687. doi: 10.1093/jamia/ocaa104.

Beyond Phecodes: leveraging PheMAP to identify patients lacking diagnosis codes in electronic health records.超越疾病编码：利用PheMAP在电子健康记录中识别无诊断编码的患者。

J Am Med Inform Assoc. 2025 Jun 1;32(6):1007-1014. doi: 10.1093/jamia/ocaf055.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Large language models facilitate the generation of electronic health record phenotyping algorithms.大语言模型有助于电子健康记录表型算法的生成。

J Am Med Inform Assoc. 2024 Sep 1;31(9):1994-2001. doi: 10.1093/jamia/ocae072.

Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records.利用电子健康记录进行自动化表型分析鉴定的双相情感障碍的遗传学验证。

Transl Psychiatry. 2018 Apr 18;8(1):86. doi: 10.1038/s41398-018-0133-7.

Quality improvement strategies for diabetes care: Effects on outcomes for adults living with diabetes.糖尿病护理质量改进策略：对成年糖尿病患者结局的影响。

Cochrane Database Syst Rev. 2023 May 31;5(5):CD014513. doi: 10.1002/14651858.CD014513.

Large Language Models Facilitate the Generation of Electronic Health Record Phenotyping Algorithms.大语言模型助力电子健康记录表型算法的生成。

medRxiv. 2024 Feb 26:2023.12.19.23300230. doi: 10.1101/2023.12.19.23300230.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。

Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.

引用本文的文献

Beyond Phecodes: leveraging PheMAP to identify patients lacking diagnosis codes in electronic health records.超越疾病编码：利用PheMAP在电子健康记录中识别无诊断编码的患者。

J Am Med Inform Assoc. 2025 Jun 1;32(6):1007-1014. doi: 10.1093/jamia/ocaf055.

Large language models facilitate the generation of electronic health record phenotyping algorithms.大语言模型有助于电子健康记录表型算法的生成。

J Am Med Inform Assoc. 2024 Sep 1;31(9):1994-2001. doi: 10.1093/jamia/ocae072.

Improving reporting standards for phenotyping algorithm in biomedical research: 5 fundamental dimensions.提高生物医学研究中表型算法的报告标准：5个基本维度。

J Am Med Inform Assoc. 2024 Apr 3;31(4):1036-1041. doi: 10.1093/jamia/ocae005.

Large Language Models Facilitate the Generation of Electronic Health Record Phenotyping Algorithms.大语言模型助力电子健康记录表型算法的生成。

medRxiv. 2024 Feb 26:2023.12.19.23300230. doi: 10.1101/2023.12.19.23300230.

Dementia and electronic health record phenotypes: a scoping review of available phenotypes and opportunities for future research.痴呆症和电子健康记录表型：现有表型及其未来研究机会的范围综述。

J Am Med Inform Assoc. 2023 Jun 20;30(7):1333-1348. doi: 10.1093/jamia/ocad086.

本文引用的文献

An updated, computable MEDication-Indication resource for biomedical research.用于生物医学研究的更新的、可计算的 MEDication-Indication 资源。

Sci Rep. 2021 Sep 23;11(1):18953. doi: 10.1038/s41598-021-98579-4.

Quality Assessment of Online Resources for the Most Common Cancers.最常见癌症在线资源的质量评估

J Cancer Educ. 2023 Feb;38(1):34-41. doi: 10.1007/s13187-021-02075-2. Epub 2021 Aug 8.

PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records.PheMap：一个用于电子健康记录中高通量表型分析的多资源知识库。

J Am Med Inform Assoc. 2020 Nov 1;27(11):1675-1687. doi: 10.1093/jamia/ocaa104.

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).使用一种常见的半监督方法（PheCAP）对电子病历数据进行高通量表型分析。

Nat Protoc. 2019 Dec;14(12):3426-3444. doi: 10.1038/s41596-019-0227-6. Epub 2019 Nov 20.

Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation.将ICD - 10和ICD - 10 - CM编码映射到疾病编码：工作流程开发与初步评估

JMIR Med Inform. 2019 Nov 29;7(4):e14325. doi: 10.2196/14325.

nVenn: generalized, quasi-proportional Venn and Euler diagrams.nVenn：广义、准比例 Venn 和 Euler 图。

Bioinformatics. 2018 Jul 1;34(13):2322-2324. doi: 10.1093/bioinformatics/bty109.

Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record.评估电子健康记录中全表型关联研究的疾病编码、临床分类软件和国际疾病分类第九版临床修订本编码。

PLoS One. 2017 Jul 7;12(7):e0175508. doi: 10.1371/journal.pone.0175508. eCollection 2017.

PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.PheKB：一个用于创建可移植电子表型算法的目录和工作流程。

J Am Med Inform Assoc. 2016 Nov;23(6):1046-1052. doi: 10.1093/jamia/ocv202. Epub 2016 Mar 28.

Deep phenotyping: The details of disease.深度表型分析：疾病的细节

Nature. 2015 Nov 5;527(7576):S14-5. doi: 10.1038/527S14a.

UpSet: Visualization of Intersecting Sets.UpSet：相交集的可视化

IEEE Trans Vis Comput Graph. 2014 Dec;20(12):1983-92. doi: 10.1109/TVCG.2014.2346248.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验