Ege University Faculty of Medicine, Department of Biostatistics and Medical Informatics, Turkey.
Genomics Team, Microsoft Research, Redmond, WA, USA.
Biomed Res Int. 2020 Jul 15;2020:1895076. doi: 10.1155/2020/1895076. eCollection 2020.
Missing observations are always a challenging problem that we have to deal with in diseases that require follow-up. In hospital records for vesicoureteral reflux (VUR) and recurrent urinary tract infection (rUTI), the number of complete cases is very low on demographic and clinical characteristics, laboratory findings, and imaging data. On the other hand, deep learning (DL) approaches can be used for highly missing observation scenarios with its own missing ratio algorithm. In this study, the effects of multiple imputation techniques MICE and FAMD on the performance of DL in the differential diagnosis were compared. The data of a retrospective cross-sectional study including 611 pediatric patients were evaluated (425 with VUR, 186 with rUTI, 26.65% missing ratio) in this research. CNTK and R 3.6.3 have been used for evaluating different models for 34 features (physical, laboratory, and imaging findings). In the differential diagnosis of VUR and rUTI, the best performance was obtained by deep learning with MICE algorithm with its values, respectively, 64.05% accuracy, 64.59% sensitivity, and 62.62% specificity. FAMD algorithm performed with accuracy = 61.52, sensitivity = 60.20, and specificity was found out to be 61.00 with 3 principal components on missing imputation phase. DL-based approaches can evaluate datasets without doing preomit/impute missing values from datasets. Once DL method is used together with appropriate missing imputation techniques, it shows higher predictive performance.
在需要随访的疾病中,缺失观察值始终是一个我们必须处理的挑战。在治疗膀胱输尿管反流 (VUR) 和复发性尿路感染 (rUTI) 的医院记录中,完整病例在人口统计学和临床特征、实验室发现和影像学数据方面非常少。另一方面,深度学习 (DL) 方法可以用于高度缺失观测的情况,并且具有自己的缺失比率算法。在这项研究中,比较了多重插补技术 MICE 和 FAMD 对 DL 在鉴别诊断中的性能的影响。这项研究评估了一项回顾性横断面研究的 611 名儿科患者的数据(425 例 VUR,186 例 rUTI,缺失率为 26.65%)。在 VUR 和 rUTI 的鉴别诊断中,使用 MICE 算法的深度学习获得了最佳性能,其值分别为 64.05%的准确率、64.59%的灵敏度和 62.62%的特异性。在缺失插补阶段使用 3 个主成分,FAMD 算法的准确率为 61.52%,灵敏度为 60.20%,特异性为 61.00%。基于 DL 的方法可以评估没有从数据集中预先忽略/插补缺失值的数据集。一旦将 DL 方法与适当的缺失插补技术一起使用,它就会显示出更高的预测性能。