College of Computer Intelligence, Zhengzhou University, Zhengzhou, China.
Pengcheng laboratory, Shenzhen, Guangdong, China.
Math Biosci Eng. 2022 Jul 13;19(10):10006-10021. doi: 10.3934/mbe.2022467.
Electronic Medical Record (EMR) is the data basis of intelligent diagnosis. The diagnosis results of an EMR are multi-disease, including normal diagnosis, pathological diagnosis and complications, so intelligent diagnosis can be treated as multi-label classification problem. The distribution of diagnostic results in EMRs is imbalanced. And the diagnostic results in one EMR have a high coupling degree. The traditional rebalancing methods does not function effectively on highly coupled imbalanced datasets. This paper proposes Double Decoupled Network (DDN) based intelligent diagnosis model, which decouples representation learning and classifier learning. In the representation learning stage, Convolutional Neural Networks (CNN) is used to learn the original features of the data. In the classifier learning stage, a Decoupled and Rebalancing highly Imbalanced Labels (DRIL) algorithm is proposed to decouple the highly coupled diagnostic results and rebalance the datasets, and then the balanced datasets is used to train the classifier. This paper evaluates the proposed DDN using Chinese Obstetric EMR (COEMR) datasets, and verifies the effectiveness and universality of the model on two benchmark multi-label text classification datasets: Arxiv Academic Papers Datasets (AAPD) and Reuters Corpus1 (RCV1). Demonstrating the effectiveness of the proposed methods is an imbalanced obstetric EMRs. The accuracy of DDN model on COEMR, AAPD and RCV1 datasets is 84.17, 86.35 and 93.87% respectively, which is higher than the current optimal experimental results.
电子病历(EMR)是智能诊断的基础数据。EMR 的诊断结果是多疾病的,包括正常诊断、病理诊断和并发症,因此智能诊断可以视为多标签分类问题。EMR 中的诊断结果分布不平衡,并且一个 EMR 中的诊断结果具有高度的耦合度。传统的再平衡方法在高度耦合的不平衡数据集上效果不佳。本文提出了基于双解耦网络(DDN)的智能诊断模型,该模型将表示学习和分类器学习解耦。在表示学习阶段,使用卷积神经网络(CNN)学习数据的原始特征。在分类器学习阶段,提出了一种解耦和再平衡高度不平衡标签(DRIL)算法,以解耦高度耦合的诊断结果并重新平衡数据集,然后使用平衡数据集训练分类器。本文使用中国妇产科电子病历(COEMR)数据集评估所提出的 DDN,并在两个基准多标签文本分类数据集:arxiv 学术论文数据集(AAPD)和路透社语料库 1(RCV1)上验证模型的有效性和普遍性。证明所提出的方法在不平衡的妇产科 EMR 中的有效性。DDN 模型在 COEMR、AAPD 和 RCV1 数据集上的准确率分别为 84.17%、86.35%和 93.87%,高于当前最佳实验结果。