Suppr超能文献

用于电子健康记录可解释文本分类的主题建模

Topic Modeling for Interpretable Text Classification From EHRs.

作者信息

Rijcken Emil, Kaymak Uzay, Scheepers Floortje, Mosteiro Pablo, Zervanou Kalliopi, Spruit Marco

机构信息

Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven, Netherlands.

Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands.

出版信息

Front Big Data. 2022 May 4;5:846930. doi: 10.3389/fdata.2022.846930. eCollection 2022.

Abstract

The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance.

摘要

电子健康记录中的临床记录在文本分类的预测任务中有很多可能性。这些分类模型在临床领域的可解释性对于决策至关重要。使用主题模型对电子健康记录进行文本分类以完成预测任务,可以将主题用作特征,从而使文本分类更具可解释性。然而,选择最有效的主题模型并非易事。在这项工作中,我们基于文本分类的预测性能和可解释性度量,提出了选择合适主题模型的考量因素。我们在使用临床记录的住院患者暴力预测任务中,从可解释性和预测性能两方面比较了17种不同的主题模型。我们发现可解释性与预测性能之间没有相关性。此外,我们的结果表明,尽管没有一个模型在这两个变量上都优于其他模型,但我们提出的模糊主题建模算法(FLSA-W)在大多数情况下的可解释性方面表现最佳,而两种先进方法(ProdLDA和LSI)则实现了最佳的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ffc/9114871/d3571b3764ae/fdata-05-846930-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验