Suppr超能文献

SureLDA:一种电子健康记录中的多疾病自动化表型方法。

sureLDA: A multidisease automated phenotyping method for the electronic health record.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

Harvard Medical School, Boston, Massachusetts, USA.

出版信息

J Am Med Inform Assoc. 2020 Aug 1;27(8):1235-1243. doi: 10.1093/jamia/ocaa079.

Abstract

OBJECTIVE

A major bottleneck hindering utilization of electronic health record data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though International Classification of Diseases codes are often used as surrogates for true labels in this setting, these sometimes suffer from poor specificity. We propose a fully automated topic modeling algorithm to simultaneously annotate multiple phenotypes.

MATERIALS AND METHODS

Surrogate-guided ensemble latent Dirichlet allocation (sureLDA) is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on 2 surrogate features for each target phenotype, and then leverages these probabilities to constrain the LDA topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities.

RESULTS

sureLDA achieves reliably high accuracy and precision across a range of simulated and real-world phenotypes. Its performance is robust to phenotype prevalence and relative informativeness of surogate vs nonsurrogate features. It also exhibits powerful feature selection properties.

DISCUSSION

sureLDA combines attractive properties of PheNorm and LDA to achieve high accuracy and precision robust to diverse phenotype characteristics. It offers particular improvement for phenotypes insufficiently captured by a few surrogate features. Moreover, sureLDA's feature selection ability enables it to handle high feature dimensions and produce interpretable computational phenotypes.

CONCLUSIONS

sureLDA is well suited toward large-scale electronic health record phenotyping for highly multiphenotype applications such as phenome-wide association studies .

摘要

目的

电子健康记录数据在转化研究中的应用受到一个主要瓶颈的限制,即缺乏精确的表型标签。图表审查以及基于规则和监督的表型方法需要费力的专家投入,这阻碍了需要定义和标记许多新表型的研究的适用性。虽然在这种情况下,国际疾病分类代码通常被用作真实标签的替代品,但这些代码有时特异性较差。我们提出了一种完全自动化的主题建模算法,以同时注释多个表型。

材料和方法

Surrogate-guided 集成潜在狄利克雷分配(sureLDA)是一种无标签多维表型分析方法。它首先使用 PheNorm 算法根据每个目标表型的 2 个替代特征初始化概率,然后利用这些概率来约束 LDA 主题模型生成特定于表型的主题。最后,它通过聚类集成将表型-特征计数与替代物结合起来,得到最终的表型概率。

结果

sureLDA 在一系列模拟和真实世界的表型中都能可靠地实现高准确性和高精度。其性能对表型流行率以及替代物与非替代物特征的相对信息量具有鲁棒性。它还具有强大的特征选择特性。

讨论

sureLDA 将 PheNorm 和 LDA 的吸引人的特性结合起来,实现了对各种表型特征具有鲁棒性的高准确性和高精度。它为少数替代物特征不足以捕捉到的表型提供了特别的改进。此外,sureLDA 的特征选择能力使其能够处理高特征维度并产生可解释的计算表型。

结论

sureLDA 非常适合用于大规模电子健康记录表型分析,适用于表型广泛的关联研究等高度多表型应用。

相似文献

2
Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.
6
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
9
Automatic phenotyping of electronical health record: PheVis algorithm.电子健康记录的自动表型分析:PheVis算法。
J Biomed Inform. 2021 May;117:103746. doi: 10.1016/j.jbi.2021.103746. Epub 2021 Mar 19.
10
Feature extraction for phenotyping from semantic and knowledge resources.从语义和知识资源中进行表型特征提取。
J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

引用本文的文献

1
Automated Shared Phenotype Discovery in Undiagnosed Cohorts for Rare Disease Research.罕见病研究中未确诊队列的自动化共享表型发现
Proc Int Conf Mach Learn Appl. 2024 Dec;2024:1025-1030. doi: 10.1109/icmla61862.2024.00154. Epub 2025 Mar 4.
10
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

本文引用的文献

5
Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。
J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.
10
Electronic medical record phenotyping using the anchor and learn framework.使用锚定与学习框架进行电子病历表型分析。
J Am Med Inform Assoc. 2016 Jul;23(4):731-40. doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验