School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China.
School of Artificial Intelligence, Dalian University of Technology, Dalian, Liaoning, China.
J Comput Biol. 2023 Aug;30(8):912-925. doi: 10.1089/cmb.2023.0079.
Clinical notes are comprehensive files containing explicit information about a patient's visit. However, accurately assigning medical codes from clinical documents can be a persistent challenge due to the complexity of clinical data and the vast range of medical codes. Moreover, the large volume of medical records, the noisy medical records, and the uneven quality of coders all negatively impact the quality of the final codes. Deep learning technology has recently been integrated into automatic International Classification of Diseases (ICD) coding tasks to improve accuracy. Nevertheless, the imbalanced class problem, the complexness of code associations, and the noise in lengthy records still restrict the advancement of ICD coding tasks in deep learning. Thus, we present the Note-code Interaction Denoising Network (NIDN) that employs the self-attention mechanism to pull critical semantic features in electronic medical records (EMRs). Our model utilizes the label attention mechanism for retaining code-specific text expression. We introduce Clinical Classifications Software coding for multitask learning, capturing the functional relationships of medical coding to oblige in model prediction. To minimize the impact of noise on model prediction and improve the label distribution imbalance, a denoising module is introduced to filter noise. Our practical consequences indicate that the model NIDN exceeds competitive models on a third version of Medical Information Mart for Intensive Care data set.
临床记录是包含患者就诊详细信息的综合文件。然而,由于临床数据的复杂性和医疗编码的广泛范围,准确地从临床文档中分配医疗编码仍然是一个持续存在的挑战。此外,大量的医疗记录、嘈杂的医疗记录以及编码人员素质参差不齐,都会对最终编码的质量产生负面影响。深度学习技术最近已被整合到自动国际疾病分类(ICD)编码任务中,以提高准确性。然而,类不平衡问题、编码关联的复杂性以及长记录中的噪声仍然限制了深度学习在 ICD 编码任务中的发展。因此,我们提出了 Note-code Interaction Denoising Network(NIDN),该模型利用自注意力机制提取电子病历(EMR)中的关键语义特征。我们的模型利用标签注意力机制保留特定于代码的文本表达。我们引入了 Clinical Classifications Software 编码进行多任务学习,捕捉医疗编码的功能关系,以促进模型预测。为了最小化噪声对模型预测的影响并改善标签分布不平衡,引入了去噪模块来过滤噪声。我们的实际结果表明,该模型在第三个重症监护医疗信息集市数据集上的表现优于竞争模型。