Suppr超能文献

混合 EHR 引导:一种使用电子健康记录进行大规模自动表型分析的引导式多模态主题建模方法。

MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record.

机构信息

Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA; Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA.

School of Computer Science, McGill University, 3480 Rue University, Montreal, QC H3A 2A7, Canada.

出版信息

J Biomed Inform. 2022 Oct;134:104190. doi: 10.1016/j.jbi.2022.104190. Epub 2022 Sep 1.

Abstract

Electronic Health Records (EHRs) contain rich clinical data collected at the point of the care, and their increasing adoption offers exciting opportunities for clinical informatics, disease risk prediction, and personalized treatment recommendation. However, effective use of EHR data for research and clinical decision support is often hampered by a lack of reliable disease labels. To compile gold-standard labels, researchers often rely on clinical experts to develop rule-based phenotyping algorithms from billing codes and other surrogate features. This process is tedious and error-prone due to recall and observer biases in how codes and measures are selected, and some phenotypes are incompletely captured by a handful of surrogate features. To address this challenge, we present a novel automatic phenotyping model called MixEHR-Guided (MixEHR-G), a multimodal hierarchical Bayesian topic model that efficiently models the EHR generative process by identifying latent phenotype structure in the data. Unlike existing topic modeling algorithms wherein the inferred topics are not identifiable, MixEHR-G uses prior information from informative surrogate features to align topics with known phenotypes. We applied MixEHR-G to an openly-available EHR dataset of 38,597 intensive care patients (MIMIC-III) in Boston, USA and to administrative claims data for a population-based cohort (PopHR) of 1.3 million people in Quebec, Canada. Qualitatively, we demonstrate that MixEHR-G learns interpretable phenotypes and yields meaningful insights about phenotype similarities, comorbidities, and epidemiological associations. Quantitatively, MixEHR-G outperforms existing unsupervised phenotyping methods on a phenotype label annotation task, and it can accurately estimate relative phenotype prevalence functions without gold-standard phenotype information. Altogether, MixEHR-G is an important step towards building an interpretable and automated phenotyping system using EHR data.

摘要

电子健康记录 (EHR) 包含在护理点收集的丰富临床数据,其日益普及为临床信息学、疾病风险预测和个性化治疗推荐提供了令人兴奋的机会。然而,由于缺乏可靠的疾病标签,EHR 数据在研究和临床决策支持中的有效利用经常受到阻碍。为了编制黄金标准标签,研究人员通常依赖临床专家从计费代码和其他替代特征中开发基于规则的表型算法。由于在选择代码和措施时存在回忆和观察者偏差,并且一些表型不能仅由少数替代特征完全捕获,因此该过程繁琐且容易出错。为了解决这一挑战,我们提出了一种名为 MixEHR-Guided (MixEHR-G) 的新型自动表型模型,这是一种多模态分层贝叶斯主题模型,通过在数据中识别潜在的表型结构,有效地对 EHR 生成过程进行建模。与现有主题建模算法不同,MixEHR-G 中推断出的主题是不可识别的,MixEHR-G 使用来自信息丰富的替代特征的先验信息来将主题与已知表型对齐。我们将 MixEHR-G 应用于美国波士顿的一个公开可用的 38597 例重症监护患者 (MIMIC-III) 的 EHR 数据集和一个基于人群的 130 万人队列 (PopHR) 的行政索赔数据。定性地,我们证明 MixEHR-G 学习可解释的表型,并对表型相似性、合并症和流行病学关联产生有意义的见解。定量地,MixEHR-G 在表型标签注释任务上优于现有的无监督表型方法,并且可以在没有黄金标准表型信息的情况下准确估计相对表型流行函数。总的来说,MixEHR-G 是使用 EHR 数据构建可解释和自动化表型系统的重要一步。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验