用于电子健康记录可解释文本分类的主题建模

Topic Modeling for Interpretable Text Classification From EHRs.

作者信息

Rijcken Emil, Kaymak Uzay, Scheepers Floortje, Mosteiro Pablo, Zervanou Kalliopi, Spruit Marco

机构信息

Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven, Netherlands.

Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands.

出版信息

Front Big Data. 2022 May 4;5:846930. doi: 10.3389/fdata.2022.846930. eCollection 2022.

DOI:10.3389/fdata.2022.846930

PMID:35600326

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9114871/

Abstract

The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance.

摘要

电子健康记录中的临床记录在文本分类的预测任务中有很多可能性。这些分类模型在临床领域的可解释性对于决策至关重要。使用主题模型对电子健康记录进行文本分类以完成预测任务，可以将主题用作特征，从而使文本分类更具可解释性。然而，选择最有效的主题模型并非易事。在这项工作中，我们基于文本分类的预测性能和可解释性度量，提出了选择合适主题模型的考量因素。我们在使用临床记录的住院患者暴力预测任务中，从可解释性和预测性能两方面比较了17种不同的主题模型。我们发现可解释性与预测性能之间没有相关性。此外，我们的结果表明，尽管没有一个模型在这两个变量上都优于其他模型，但我们提出的模糊主题建模算法（FLSA-W）在大多数情况下的可解释性方面表现最佳，而两种先进方法（ProdLDA和LSI）则实现了最佳的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ffc/9114871/d3571b3764ae/fdata-05-846930-g0001.jpg

相似文献

Topic Modeling for Interpretable Text Classification From EHRs.用于电子健康记录可解释文本分类的主题建模

Front Big Data. 2022 May 4;5:846930. doi: 10.3389/fdata.2022.846930. eCollection 2022.

Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.使用树套索逻辑回归构建用于儿科医院再入院的可解释预测模型。

Artif Intell Med. 2016 Sep;72:12-21. doi: 10.1016/j.artmed.2016.07.003. Epub 2016 Jul 29.

An interpretable method for automated classification of spoken transcripts and written text.一种用于自动分类口语记录和书面文本的可解释方法。

Evol Intell. 2023 May 4:1-13. doi: 10.1007/s12065-023-00851-1.

Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.研究基于神经主题模型的词向量有效利用，以实现短文本的可解释主题。

Sensors (Basel). 2022 Jan 23;22(3):852. doi: 10.3390/s22030852.

Topic Modeling Based Classification of Clinical Reports.基于主题模型的临床报告分类

Proc Conf Assoc Comput Linguist Meet. 2013 Aug;2013:67-73.

Topic selection for text classification using ensemble topic modeling with grouping, scoring, and modeling approach.使用具有分组、评分和建模方法的集成主题建模进行文本分类的主题选择

Sci Rep. 2024 Oct 9;14(1):23516. doi: 10.1038/s41598-024-74022-2.

Evaluating topic model interpretability from a primary care physician perspective.从初级保健医生的角度评估主题模型的可解释性。

Comput Methods Programs Biomed. 2016 Feb;124:67-75. doi: 10.1016/j.cmpb.2015.10.014. Epub 2015 Oct 30.

Quantifying decision support level of explainable automatic classification of diagnoses in Spanish medical records.定量评估西班牙语病历中可解释的自动诊断分类的决策支持水平。

Comput Biol Med. 2024 Nov;182:109127. doi: 10.1016/j.compbiomed.2024.109127. Epub 2024 Sep 12.

A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction.多模态 Transformer：融合临床笔记与结构化电子健康记录数据以实现可解释的住院死亡率预测。

AMIA Annu Symp Proc. 2023 Apr 29;2022:719-728. eCollection 2022.

Evaluation of clustering and topic modeling methods over health-related tweets and emails.健康相关推文和电子邮件的聚类和主题建模方法评估。

Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.

引用本文的文献

Secure latent Dirichlet allocation.安全潜在狄利克雷分配

Front Digit Health. 2025 Jul 24;7:1610228. doi: 10.3389/fdgth.2025.1610228. eCollection 2025.

A topic modeling approach for analyzing and categorizing electronic healthcare documents in Afaan Oromo without label information.一种用于在没有标签信息的情况下分析和分类阿法尔奥罗莫语电子医疗文档的主题建模方法。

Sci Rep. 2024 Dec 30;14(1):32051. doi: 10.1038/s41598-024-83743-3.

Sci Rep. 2024 Oct 9;14(1):23516. doi: 10.1038/s41598-024-74022-2.

Towards a practical use of text mining approaches in electrodiagnostic data.朝着在电诊断数据中文本挖掘方法的实际应用迈进。

Sci Rep. 2023 Nov 9;13(1):19483. doi: 10.1038/s41598-023-45758-0.

A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future.机器学习算法及其在老年医学中的应用的全面综述：现状与未来。

Aging Clin Exp Res. 2023 Nov;35(11):2363-2397. doi: 10.1007/s40520-023-02552-2. Epub 2023 Sep 8.

The added value of text from Dutch general practitioner notes in predictive modeling.荷兰全科医生记录中文本在预测建模中的附加价值。

J Am Med Inform Assoc. 2023 Nov 17;30(12):1973-1984. doi: 10.1093/jamia/ocad160.

Web content topic modeling using LDA and HTML tags.使用潜在狄利克雷分配（LDA）和HTML标签的网页内容主题建模

PeerJ Comput Sci. 2023 Jul 11;9:e1459. doi: 10.7717/peerj-cs.1459. eCollection 2023.

Evaluating the use of large language model in identifying top research questions in gastroenterology.评估大型语言模型在识别胃肠病学领域顶级研究问题中的应用。

Sci Rep. 2023 Mar 13;13(1):4164. doi: 10.1038/s41598-023-31412-2.

本文引用的文献

Development and Validation of a Deep Learning Algorithm for Mortality Prediction in Selecting Patients With Dementia for Earlier Palliative Care Interventions.开发和验证一种深度学习算法，用于预测痴呆患者的死亡率，以便更早地进行姑息治疗干预。

JAMA Netw Open. 2019 Jul 3;2(7):e196972. doi: 10.1001/jamanetworkopen.2019.6972.

Machine Learning Approach to Inpatient Violence Risk Assessment Using Routinely Collected Clinical Notes in Electronic Health Records.基于电子健康记录中常规采集的临床记录的机器学习方法进行住院患者暴力风险评估。

JAMA Netw Open. 2019 Jul 3;2(7):e196709. doi: 10.1001/jamanetworkopen.2019.6709.

Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting.利用电子心理健康记录的自然语言处理进行住院法医精神病学环境中的风险预测。

J Biomed Inform. 2018 Oct;86:49-58. doi: 10.1016/j.jbi.2018.08.007. Epub 2018 Aug 14.

Predicting early psychiatric readmission with natural language processing of narrative discharge summaries.通过对出院小结进行自然语言处理预测早期精神科再入院情况。

Transl Psychiatry. 2016 Oct 18;6(10):e921. doi: 10.1038/tp.2015.182.

Software survey: VOSviewer, a computer program for bibliometric mapping.软件综述：VOSviewer，一款用于文献计量绘图的计算机程序。

Scientometrics. 2010 Aug;84(2):523-538. doi: 10.1007/s11192-009-0146-3. Epub 2009 Dec 31.

The Helmholtz machine.亥姆霍兹机器

Neural Comput. 1995 Sep;7(5):889-904. doi: 10.1162/neco.1995.7.5.889.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于电子健康记录可解释文本分类的主题建模

Topic Modeling for Interpretable Text Classification From EHRs.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献