Sammani Arjan, Bagheri Ayoub, van der Heijden Peter G M, Te Riele Anneline S J M, Baas Annette F, Oosters C A J, Oberski Daniel, Asselbergs Folkert W
Department of Cardiology, Division of Heart & Lungs, University Medical Centre Utrecht, University of Utrecht, Utrecht, The Netherlands.
Department of Methodology and Statistics, Faculty of Social Sciences, Utrecht University, Utrecht, The Netherlands.
NPJ Digit Med. 2021 Feb 26;4(1):37. doi: 10.1038/s41746-021-00404-9.
Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76-0.99 for three-character and 0.87-0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.
诊断和风险因素的标准参考术语对于计费、流行病学研究以及疾病的国际/国内比较至关重要。国际疾病分类(ICD)是一种标准化且广泛使用的方法,但手动分类是一项极其耗时的工作。自然语言处理与机器学习相结合,可以使用ICD - 10编码对诊断进行自动结构化处理,然而机器学习模型的性能有限、需要庞大的数据集以及这些编码末尾部分的可靠性较差,限制了其临床实用性。我们旨在创建一个高性能的流程,用于对心脏病学免费医学文本中的可靠ICD - 10编码进行自动分类。我们专注于常用且定义明确的三位和四位ICD - 10编码,这些编码仍具有足够的粒度以具有临床相关性,例如心房颤动(I48)、急性心肌梗死(I21)或扩张型心肌病(I42.0)。我们的流程使用了一种称为双向门控循环单元神经网络的深度神经网络,并使用5548份出院小结进行训练和测试,并在5089份出院小结和手术记录中进行了验证。由于在临床实践中出院小结可能会被标记多个编码,我们评估了主要诊断和心血管风险因素的单标签和多标签性能。我们研究了使用整个文本主体以及仅使用总结段落,并辅以年龄和性别信息。考虑到出院小结中包含隐私敏感信息,我们添加了去识别步骤。性能很高,三位ICD - 10编码的F1分数为0.76 - 0.99,四位编码为0.87 - 0.98,使用完整出院小结时效果最佳。添加年龄/性别变量不影响结果。为了实现模型可解释性,提供了词系数并手动进行了分类的定性评估。由于其高性能,该流程有助于减轻出院诊断分类的管理负担,并可作为报销和研究应用的框架。