Rijcken Emil, Kaymak Uzay, Scheepers Floortje, Mosteiro Pablo, Zervanou Kalliopi, Spruit Marco
Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven, Netherlands.
Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands.
Front Big Data. 2022 May 4;5:846930. doi: 10.3389/fdata.2022.846930. eCollection 2022.
The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance.
电子健康记录中的临床记录在文本分类的预测任务中有很多可能性。这些分类模型在临床领域的可解释性对于决策至关重要。使用主题模型对电子健康记录进行文本分类以完成预测任务,可以将主题用作特征,从而使文本分类更具可解释性。然而,选择最有效的主题模型并非易事。在这项工作中,我们基于文本分类的预测性能和可解释性度量,提出了选择合适主题模型的考量因素。我们在使用临床记录的住院患者暴力预测任务中,从可解释性和预测性能两方面比较了17种不同的主题模型。我们发现可解释性与预测性能之间没有相关性。此外,我们的结果表明,尽管没有一个模型在这两个变量上都优于其他模型,但我们提出的模糊主题建模算法(FLSA-W)在大多数情况下的可解释性方面表现最佳,而两种先进方法(ProdLDA和LSI)则实现了最佳的预测性能。