Li Zhanchi, Tan Zelong, Wang Zheyuan, Tang Wenjuan, Ren Xiang, Fu Jinhua, Wang Guangbing, Chu Han, Chen Jiarong, Duan Yuhe, Zhuang Likai, Wu Min
Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Department of Electronic Engineering, Tsinghua University, Beijing, China.
EClinicalMedicine. 2024 Feb 9;69:102466. doi: 10.1016/j.eclinm.2024.102466. eCollection 2024 Mar.
Voiding cystourethrography (VCUG) is the gold standard for the diagnosis and grading of vesicoureteral reflux (VUR). However, VUR grading from voiding cystourethrograms is highly subjective with low reliability. This study aimed to develop a deep learning model to improve reliability for VUR grading on VCUG and compare its performance to that of clinicians.
In this retrospective study in China, VCUG images were collected between January 2019 and September 2022 from our institution as an internal dataset for training and 4 external data sets as external testing set for validation. Samples were divided into training (N = 1000) and validation sets (N = 500), internal testing set (N = 168), and external testing set (N = 280). An ensemble learning-based model, Deep-VCUG, using Res-Net 101 and the voting methods was developed to predict VUR grade. The grading performance was assessed using heatmaps, area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, and F1 score in the internal and external testing set. The performances of four clinicians (2 pediatric urologists and 2 radiologists) with and without the Deep-VCUG assisted to predict VUR grade were explored in external testing sets.
A total of 1948 VCUG images were collected (Internal dataset = 1668; multi-center external dataset = 280). For assessing unilateral VUR grading, the Deep-VCUG achieved AUCs of 0.962 (95% confidence interval [CI]: 0.943-0.978) and 0.944 (95% [CI]: 0.921-0.964) in the internal and external testing sets, respectively, for bilateral VUR grading, the Deep-VCUG also achieved high AUCs of 0.960 (95% [CI]: 0.922-0.983) and 0.924 (95% [CI]: 0.887-0.957). The Deep-VCUG model using voting method outperformed single model and clinician in terms of classification based on VCUG image. Moreover, Under the Dee-VCUG assisted, the classification ability of junior and senior clinicians was significantly improved.
The Deep-VCUG model is a generalizable, objective, and accurate tool for vesicoureteral reflux grading based on VCUG imaging and had good assistance with clinicians to VUR grading applicability.
This study was supported by Natural Science Foundation of China, "Fuqing Scholar" Student Scientific Research Program of Shanghai Medical College, Fudan University, and the Program of Greater Bay Area Institute of Precision Medicine (Guangzhou).
排尿性膀胱尿道造影(VCUG)是诊断膀胱输尿管反流(VUR)及对其进行分级的金标准。然而,通过排尿性膀胱尿道造影进行的VUR分级主观性很强,可靠性较低。本研究旨在开发一种深度学习模型,以提高基于VCUG的VUR分级的可靠性,并将其性能与临床医生的性能进行比较。
在这项中国的回顾性研究中,2019年1月至2022年9月期间从我们机构收集VCUG图像作为内部训练数据集,并收集4个外部数据集作为外部测试集用于验证。样本分为训练集(N = 1000)和验证集(N = 500)、内部测试集(N = 168)和外部测试集(N = 280)。开发了一种基于集成学习的模型Deep-VCUG,使用Res-Net 101和投票方法来预测VUR分级。在内部和外部测试集中,使用热图、受试者操作特征曲线下面积(AUC)、敏感性、特异性、准确性和F1分数评估分级性能。在外部测试集中探索了4名临床医生(2名小儿泌尿科医生和2名放射科医生)在有和没有Deep-VCUG辅助的情况下预测VUR分级的表现。
共收集了1948张VCUG图像(内部数据集 = 1668;多中心外部数据集 = 280)。对于评估单侧VUR分级,Deep-VCUG在内部和外部测试集中的AUC分别为0.962(95%置信区间[CI]:0.943 - 0.978)和0.944(95%[CI]:)0.921 - 0.964);对于双侧VUR分级,Deep-VCUG也获得了较高的AUC,分别为0.960(95%[CI]:0.922 - 0.983)和0.924(95%[CI]:0.887 - 0.957)。基于VCUG图像分类,使用投票方法的Deep-VCUG模型在性能上优于单一模型和临床医生。此外,在Deep-VCUG辅助下,初级和高级临床医生的分类能力显著提高。
Deep-VCUG模型是一种基于VCUG成像的可推广、客观且准确的膀胱输尿管反流分级工具,对临床医生进行VUR分级的适用性有很好的辅助作用。
本研究得到了中国国家自然科学基金、复旦大学上海医学院“福清学者”学生科研项目以及广州大湾区精准医学研究院项目的支持。