Zhu Zhu, Li Jing, Huang Jian, Li Zheming, Zhang Hongjian, Chen Siyu, Zhong Qianhui, Xie Yulan, Hu Shasha, Wang Yinshuo, Wang Dejian, Yu Gang
Department of Data and Information, The Children's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
National Clinical Research Center for Child Health, Hangzhou, China.
Transl Pediatr. 2022 Jul;11(7):1216-1233. doi: 10.21037/tp-22-275.
Due to the phenotypic similarities among different pediatric respiratory diseases with chronic cough, primary doctors often misdiagnose and the misuse of examinations is prevalent. In the pre-diagnosis stage, the patients' chief complaints and other information in the electronic medical record (EMR) provide a powerful reference for respiratory experts to make preliminary disease judgment and examination plan. In this paper, we proposed an intelligent prediagnosis system to predict disease diagnosis and recommend examinations based on EMR text.
We examined the clinical notes of 178,293 children with chronic cough symptoms from retrospective EMR data. The dataset is split into 7:3 for training and testing. From the testing set, we also extract 5% of samples for validation. We proposed a medical-semantic-aware convolution neural network (MSCNN) framework that can accomplish two downstream tasks from the same medical language model through transfer learning. First, a medical language model based on the word2vec algorithm was built to generate embeddings for the text data. Then, text convolutional neural network (TextCNN) was used to build models for disease prediction and examination recommendation.
We implemented 5 algorithms for disease prediction. In the disease prediction task, our algorithm outperformed the baseline methods on all metrics, with a top-1 accuracy (AC) of 0.68 and a top-3 AC of 0.923 on the testing set. By adding data enhancement, the top-3 AC reached 0.926. In the examination recommendation task, the overall AC on the testing set was 0.93 and the macro average (MA) F1-score was 0.88. The average area under the curve (AUC) on the training set was 0.97 while on the testing set it was 0.86.
We constructed an intelligent prediagnosis system with an MSCNN framework that can predict diseases and make examination recommendations based on EMR data. Our approach achieved good results on a retrospective clinical dataset and thus has great potential for the application of automated diagnosis assist in clinical practice during pre-diagnosis stage, which will provide help for primary level doctors or doctors in basic-level hospitals. Due to the generality of the proposed framework, it can be straight forwardly extended to prediagnosis for other diseases.
由于不同伴有慢性咳嗽的儿科呼吸道疾病之间存在表型相似性,基层医生常常误诊,且检查滥用现象普遍。在预诊断阶段,患者的主诉及电子病历(EMR)中的其他信息为呼吸专家做出初步疾病判断和检查计划提供了有力参考。在本文中,我们提出了一种基于EMR文本预测疾病诊断并推荐检查的智能预诊断系统。
我们从回顾性EMR数据中检查了178293例有慢性咳嗽症状儿童的临床记录。数据集按7:3划分为训练集和测试集。我们还从测试集中提取5%的样本用于验证。我们提出了一种医学语义感知卷积神经网络(MSCNN)框架,该框架可通过迁移学习从同一医学语言模型完成两项下游任务。首先,基于word2vec算法构建医学语言模型,为文本数据生成嵌入。然后,使用文本卷积神经网络(TextCNN)构建疾病预测和检查推荐模型。
我们实现了5种疾病预测算法。在疾病预测任务中,我们的算法在所有指标上均优于基线方法,测试集上的top-1准确率(AC)为0.68,top-3 AC为0.923。通过添加数据增强,top-3 AC达到0.926。在检查推荐任务中,测试集上的总体AC为0.93,宏平均(MA)F1分数为0.88。训练集上的平均曲线下面积(AUC)为0.97,测试集上为0.86。
我们构建了一个具有MSCNN框架的智能预诊断系统,该系统可基于EMR数据预测疾病并做出检查推荐。我们的方法在回顾性临床数据集上取得了良好结果,因此在预诊断阶段的临床实践中具有用于自动诊断辅助的巨大应用潜力,这将为基层医生或基层医院的医生提供帮助。由于所提出框架的通用性,它可以直接扩展到其他疾病的预诊断。