School of Basic Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
Shuguang Hospital Affiliated with Shanghai University of Traditional Chinese Medicine, Zhangheng Road, Shanghai, China.
Int J Med Inform. 2021 May;149:104429. doi: 10.1016/j.ijmedinf.2021.104429. Epub 2021 Feb 22.
Diabetes is a chronic noncommunicable disease with high incidence rate. Diabetics without early diagnosis or standard treatment may contribute to serious multisystem complications, which can be life threatening. Timely detection and intervention of prediabetes is very important to prevent diabetes, because it is inevitable in the development and progress of the disease.
Our objective was to establish the predictive model that can be applied to evaluate people with blood glucose in high and critical state.
We established the diabetes risk prediction model formed by a combined TCM tongue diagnosis with machine learning techniques. 1512 subjects were recruited from the hospital. After data preprocessing, we got the dataset 1 and dataset 2. Dataset 1 was used to train classical machine learning model, while dataset 2 was used to train deep learning model. To evaluate the performance of the prediction model, we used Classification Accuracy(CA), Precision, Recall, F1-score, Precision-Recall curve(P-R curve), Area Under the Precision-Recall curve(AUPRC), Receiver Operating Characteristic curve(ROC curve), Area Under the Receiver Operating Characteristic curve(AUROC), then selected the best diabetes risk prediction model.
On the test set of dataset 1, the CA of non-invasive Stacking model was 71 %, micro average AUROC was 0.87, macro average AUROC was 0.84, and micro average AUPRC was 0.77. In the critical blood glucose group, the AUROC was 0.84, AUPRC was 0.67. In the high blood glucose group, AUROC was 0.87, AUPRC was 0.83. On the validation set of dataset 2, the CA of ResNet50 model was 69 %, micro average AUROC was 0.84, macro average AUROC was 0.83, and micro average AUPRC was 0.73. In the critical blood glucose group, AUROC was 0.88, AUPRC was 0.71. In the high blood glucose group, AUROC was 0.80, AUPRC was 0.76. On the test set of dataset 2, the CA of ResNet50 model was 65 %, micro average AUROC was 0.83, macro average AUROC was 0.82, and micro average AUPRC was 0.71. In the critical blood glucose group, the prediction of AUROC was 0.84, AUPRC was 0.60. In the high blood glucose group, AUROC was 0.87, AUPRC was 0.71.
Tongue features can improve the prediction accuracy of the diabetes risk prediction model formed by classical machine learning model significantly. In addition to the excellent performance, Stacking model and ResNet50 model which were recommended had non-invasive operation and were easy to use. Stacking model and ResNet50 model had high precision, low false positive rate and low misdiagnosis rate on detecting hyperglycemia. While on detecting blood glucose value in critical state, Stacking model and ResNet50 model had a high sensitivity, a low false negative rate and a low missed diagnosis rate. The study had proved that the differential changes of tongue features reflected the abnormal glucose metabolism, thus the diabetes risk prediction model formed by a combined TCM tongue diagnosis and machine learning technique was feasible.
糖尿病是一种具有高发病率的慢性非传染性疾病。未经早期诊断或标准治疗的糖尿病患者可能会导致严重的多系统并发症,这可能会危及生命。及时发现和干预糖尿病前期对于预防糖尿病非常重要,因为它在疾病的发展和进展中是不可避免的。
我们的目标是建立可用于评估血糖处于高和危急状态人群的预测模型。
我们建立了由中医舌诊与机器学习技术相结合的糖尿病风险预测模型。从医院招募了 1512 名受试者。在数据预处理后,我们得到了数据集 1 和数据集 2。数据集 1 用于训练经典机器学习模型,而数据集 2 用于训练深度学习模型。为了评估预测模型的性能,我们使用分类准确率(CA)、精确率、召回率、F1 分数、精度-召回曲线(P-R 曲线)、精度-召回曲线下面积(AUPRC)、接收者操作特征曲线(ROC 曲线)、接收者操作特征曲线下面积(AUROC),然后选择最佳的糖尿病风险预测模型。
在数据集 1 的测试集中,非侵入性 Stacking 模型的 CA 为 71%,微平均 AUROC 为 0.87,宏平均 AUROC 为 0.84,微平均 AUPRC 为 0.77。在危急血糖组中,AUROC 为 0.84,AUPRC 为 0.67。在高血糖组中,AUROC 为 0.87,AUPRC 为 0.83。在数据集 2 的验证集中,ResNet50 模型的 CA 为 69%,微平均 AUROC 为 0.84,宏平均 AUROC 为 0.83,微平均 AUPRC 为 0.73。在危急血糖组中,AUROC 为 0.88,AUPRC 为 0.71。在高血糖组中,AUROC 为 0.80,AUPRC 为 0.76。在数据集 2 的测试集中,ResNet50 模型的 CA 为 65%,微平均 AUROC 为 0.83,宏平均 AUROC 为 0.82,微平均 AUPRC 为 0.71。在危急血糖组中,AUROC 的预测值为 0.84,AUPRC 为 0.60。在高血糖组中,AUROC 为 0.87,AUPRC 为 0.71。
舌象特征可显著提高经典机器学习模型所构成的糖尿病风险预测模型的预测精度。除了优异的性能外,推荐的 Stacking 模型和 ResNet50 模型具有非侵入性操作,易于使用。Stacking 模型和 ResNet50 模型在检测高血糖时具有高精度、低假阳性率和低误诊率。而在检测危急血糖值时,Stacking 模型和 ResNet50 模型具有较高的灵敏度、较低的假阴性率和较低的漏诊率。该研究证明了舌象特征的差异变化反映了异常的糖代谢,因此,基于中医舌诊和机器学习技术相结合的糖尿病风险预测模型是可行的。