Department of Obstetrics and Gynecology, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, 200025, P.R. China.
Department of Obstetrics and Gynaecology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region, Hong Kong, P.R. China.
J Ovarian Res. 2024 Nov 6;17(1):219. doi: 10.1186/s13048-024-01544-8.
The study aimed to compare the diagnostic efficacy of the machine learning models with expert subjective assessment (SA) in assessing the malignancy risk of ovarian tumors using transvaginal ultrasound (TVUS).
The retrospective single-center diagnostic study included 1555 consecutive patients from January 2019 to May 2021. Using this dataset, Residual Network(ResNet), Densely Connected Convolutional Network(DenseNet), Vision Transformer(ViT), and Swin Transformer models were established and evaluated separately or combined with Cancer antigen 125 (CA 125). The diagnostic performance was then compared with SA.
Of the 1555 patients, 76.9% were benign, while 23.1% were malignant (including borderline). When differentiating the malignant from ovarian tumors, the SA had an AUC of 0.97 (95% CI, 0.93-0.99), sensitivity of 87.2%, and specificity of 98.4%. Except for Vision Transformer, other machine learning models had diagnostic performance comparable to that of the expert. The DenseNet model had an AUC of 0.91 (95% CI, 0.86-0.95), sensitivity of 84.6%, and specificity of 95.1%. The ResNet50 model had an AUC of 0.91 (0.85-0.95). The Swin Transformer model had an AUC of 0.92 (0.87-0.96), sensitivity of 87.2%, and specificity of 94.3%. There was a statistically significant difference between the Vision Transformer and SA, and between the Vision Transformer and Swin Transformer models (AUC: 0.87 vs. 0.97, P = 0.01; AUC: 0.87 vs. 0.92, P = 0.04). Adding CA125 did not improve the diagnostic performance of the models in distinguishing benign and malignant ovarian tumors.
The deep learning model of TVUS can be used in ovarian cancer evaluation, and its diagnostic performance is comparable to that of expert assessment.
本研究旨在比较机器学习模型与专家主观评估(SA)在使用经阴道超声(TVUS)评估卵巢肿瘤恶性风险方面的诊断效能。
本回顾性单中心诊断研究纳入了 2019 年 1 月至 2021 年 5 月期间的 1555 例连续患者。使用该数据集,分别建立了残差网络(ResNet)、密集连接卷积网络(DenseNet)、视觉Transformer(ViT)和 Swin Transformer 模型,并对其进行了评估,还将它们分别与癌抗原 125(CA 125)联合评估。然后将这些诊断性能与 SA 进行了比较。
在 1555 例患者中,76.9%为良性,23.1%为恶性(包括交界性)。在区分恶性与卵巢肿瘤时,SA 的 AUC 为 0.97(95%CI,0.93-0.99),敏感度为 87.2%,特异度为 98.4%。除了视觉 Transformer 之外,其他机器学习模型的诊断性能与专家相当。DenseNet 模型的 AUC 为 0.91(95%CI,0.86-0.95),敏感度为 84.6%,特异度为 95.1%。ResNet50 模型的 AUC 为 0.91(0.85-0.95)。Swin Transformer 模型的 AUC 为 0.92(0.87-0.96),敏感度为 87.2%,特异度为 94.3%。视觉 Transformer 与 SA 之间,以及视觉 Transformer 与 Swin Transformer 模型之间的 AUC 存在统计学差异(AUC:0.87 比 0.97,P=0.01;AUC:0.87 比 0.92,P=0.04)。添加 CA125 并不能提高模型区分良性和恶性卵巢肿瘤的诊断性能。
TVUS 的深度学习模型可用于卵巢癌评估,其诊断性能可与专家评估相媲美。