IEEE J Biomed Health Inform. 2020 Sep;24(9):2506-2515. doi: 10.1109/JBHI.2020.2996937. Epub 2020 May 25.
With the development of healthcare 4.0, there has been an explosion in the amount of data such as image, medical text, physiological signals, lab tests, etc. Among them, medical records provide a complete picture of the associated clinical events. However, the processing of medical texts is difficult because they are structurally free, diverse in style, and have subjective factors. Assigning metadata codes from the International Classification of Diseases (ICD) presents a standardized way of indicating diagnoses and procedures, so it becomes a mandatory process for understanding medical records to make better clinical and financial decisions. Such a manual encoding task is time-consuming, error-prone and expensive. In this paper, we proposed a deep learning approach and a medical topic mining method to automatically predict ICD codes from text-free medical records. The result of the F1 score on Medical Information Mart for Intensive Care (MIMIC-III) dataset increases by 5% over the state of art. It also suitable for multiple ICD versions and languages. For the specific disease, atrial fibrillation, the F1 score is up to 96% and 93.3% using in-house ICD-10 datasets and MIMIC-III datasets, respectively. We developed an Artificial Intelligence based coding system, which can greatly improve the efficiency and accuracy of human coders, and meanwhile accelerate the secondary use for clinical informatics.
随着医疗保健 4.0 的发展,诸如图像、医疗文本、生理信号、实验室测试等数据呈爆炸式增长。其中,病历提供了相关临床事件的完整情况。然而,由于医疗文本结构自由、风格多样且具有主观性因素,因此处理起来很困难。从国际疾病分类(ICD)中分配元数据代码提供了一种标准化的方法来表示诊断和程序,因此对于理解病历以做出更好的临床和财务决策来说,这成为了一个强制性的过程。这种手动编码任务既耗时、易错又昂贵。在本文中,我们提出了一种深度学习方法和一种医疗主题挖掘方法,以便从无文本病历中自动预测 ICD 代码。在 MIMIC-III 数据集上,F1 分数比现有技术提高了 5%。它还适用于多种 ICD 版本和语言。对于特定疾病心房颤动,使用内部 ICD-10 数据集和 MIMIC-III 数据集的 F1 分数分别高达 96%和 93.3%。我们开发了一种基于人工智能的编码系统,它可以大大提高人类编码员的效率和准确性,同时加速临床信息学的二次利用。