School of Computer Science and McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada.
Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.
PLoS One. 2021 Apr 8;16(4):e0249622. doi: 10.1371/journal.pone.0249622. eCollection 2021.
Latent knowledge can be extracted from the electronic notes that are recorded during patient encounters with the health system. Using these clinical notes to decipher a patient's underlying comorbidites, symptom burdens, and treatment courses is an ongoing challenge. Latent topic model as an efficient Bayesian method can be used to model each patient's clinical notes as "documents" and the words in the notes as "tokens". However, standard latent topic models assume that all of the notes follow the same topic distribution, regardless of the type of note or the domain expertise of the author (such as doctors or nurses). We propose a novel application of latent topic modeling, using multi-note topic model (MNTM) to jointly infer distinct topic distributions of notes of different types. We applied our model to clinical notes from the MIMIC-III dataset to infer distinct topic distributions over the physician and nursing note types. Based on manual assessments made by clinicians, we observed a significant improvement in topic interpretability using MNTM modeling over the baseline single-note topic models that ignore the note types. Moreover, our MNTM model led to a significantly higher prediction accuracy for prolonged mechanical ventilation and mortality using only the first 48 hours of patient data. By correlating the patients' topic mixture with hospital mortality and prolonged mechanical ventilation, we identified several diagnostic topics that are associated with poor outcomes. Because of its elegant and intuitive formation, we envision a broad application of our approach in mining multi-modality text-based healthcare information that goes beyond clinical notes. Code available at https://github.com/li-lab-mcgill/heterogeneous_ehr.
潜在知识可以从与医疗系统交互时记录的电子病历中提取出来。使用这些临床记录来推断患者的潜在合并症、症状负担和治疗过程是一个持续的挑战。潜在主题模型作为一种有效的贝叶斯方法,可以用来将每个患者的临床记录建模为“文档”,记录中的单词建模为“标记”。然而,标准的潜在主题模型假设所有记录都遵循相同的主题分布,而不管记录的类型或作者的领域专业知识(如医生或护士)如何。我们提出了一种潜在主题建模的新应用,使用多记录主题模型(MNTM)联合推断不同类型记录的不同主题分布。我们将模型应用于 MIMIC-III 数据集的临床记录中,以推断医师和护理记录类型的不同主题分布。基于临床医生的手动评估,我们观察到,与忽略记录类型的基线单记录主题模型相比,使用 MNTM 建模可以显著提高主题的可解释性。此外,我们的 MNTM 模型仅使用患者数据的前 48 小时,就能显著提高机械通气时间延长和死亡率的预测准确性。通过将患者的主题混合与医院死亡率和机械通气时间延长相关联,我们确定了一些与不良预后相关的诊断主题。由于其优雅直观的形成方式,我们设想我们的方法可以广泛应用于挖掘基于多模态文本的医疗保健信息,而不仅仅是临床记录。代码可在 https://github.com/li-lab-mcgill/heterogeneous_ehr 上获得。