Dept. of Computer Science & Engineering, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, Plzeň, 30100, Czech Republic; NTIS - New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, Plzeň, 30100, Czech Republic.
Dept. of Computer Science & Engineering, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, Plzeň, 30100, Czech Republic; NTIS - New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Univerzitni 8, Plzeň, 30100, Czech Republic.
Comput Biol Med. 2024 Aug;178:108672. doi: 10.1016/j.compbiomed.2024.108672. Epub 2024 May 27.
The International Classification of Diseases (ICD) hierarchical taxonomy is used for so-called clinical coding of medical reports, typically presented in unstructured text. In the Czech Republic, it is currently carried out manually by a so-called clinical coder. However, due to the human factor, this process is error-prone and expensive. The coder needs to be properly trained and spends significant effort on each report, leading to occasional mistakes. The main goal of this paper is to propose and implement a system that serves as an assistant to the coder and automatically predicts diagnosis codes. These predictions are then presented to the coder for approval or correction, aiming to enhance efficiency and accuracy. We consider two classification tasks: main (principal) diagnosis; and all diagnoses. Crucial requirements for the implementation include minimal memory consumption, generality, ease of portability, and sustainability. The main contribution lies in the proposal and evaluation of ICD classification models for the Czech language with relatively few training parameters, allowing swift utilisation on the prevalent computer systems within Czech hospitals and enabling easy retraining or fine-tuning with newly available data. First, we introduce a small transformer-based model for each task followed by the design of a transformer-based "Four-headed" model incorporating four distinct classification heads. This model achieves comparable, sometimes even better results, against four individual models. Moreover this novel model significantly economises memory usage and learning time. We also show that our models achieve comparable results against state-of-the-art English models on the Mimic IV dataset even though our models are significantly smaller.
《疾病和有关健康问题的国际统计分类》(ICD)层次分类法用于对医疗报告进行所谓的临床编码,这些报告通常以非结构化文本的形式呈现。在捷克共和国,目前由所谓的临床编码员手动进行。然而,由于人为因素,这个过程容易出错且成本高昂。编码员需要经过适当的培训,并在每个报告上投入大量精力,因此偶尔会出现错误。本文的主要目标是提出并实现一个系统,作为编码员的助手,自动预测诊断代码。然后将这些预测结果呈现给编码员进行批准或更正,旨在提高效率和准确性。我们考虑了两个分类任务:主要(主要)诊断和所有诊断。实现的关键要求包括最小的内存消耗、通用性、易于移植性和可持续性。主要贡献在于提出并评估了具有相对较少训练参数的捷克语 ICD 分类模型,这些模型可以快速在捷克医院中普遍使用的计算机系统上使用,并可以轻松地使用新获得的数据进行重新训练或微调。首先,我们为每个任务引入了一个基于小型转换器的模型,然后设计了一个基于转换器的“四头”模型,其中包含四个不同的分类头。该模型的结果与四个独立模型相当,有时甚至更好。此外,这个新颖的模型显著节省了内存使用和学习时间。我们还表明,即使我们的模型小得多,我们的模型在 Mimic IV 数据集上与最先进的英语模型相比也能达到相当的结果。