Christiansen Filip, Konuk Emir, Ganeshan Adithya Raju, Welch Robert, Palés Huix Joana, Czekierdowski Artur, Leone Francesco Paolo Giuseppe, Haak Lucia Anna, Fruscio Robert, Gaurilcikas Adrius, Franchi Dorella, Fischerova Daniela, Mor Elisa, Savelli Luca, Pascual Maria Àngela, Kudla Marek Jerzy, Guerriero Stefano, Buonomo Francesca, Liuba Karina, Montik Nina, Alcázar Juan Luis, Domali Ekaterini, Pangilinan Nelinda Catherine P, Carella Chiara, Munaretto Maria, Saskova Petra, Verri Debora, Visenzi Chiara, Herman Pawel, Smith Kevin, Epstein Elisabeth
Department of Clinical Science and Education, Södersjukhuset, Karolinska Institutet, Stockholm, Sweden.
Department of Obstetrics and Gynecology, Södersjukhuset, Stockholm, Sweden.
Nat Med. 2025 Jan;31(1):189-196. doi: 10.1038/s41591-024-03329-4. Epub 2025 Jan 2.
Ovarian lesions are common and often incidentally detected. A critical shortage of expert ultrasound examiners has raised concerns of unnecessary interventions and delayed cancer diagnoses. Deep learning has shown promising results in the detection of ovarian cancer in ultrasound images; however, external validation is lacking. In this international multicenter retrospective study, we developed and validated transformer-based neural network models using a comprehensive dataset of 17,119 ultrasound images from 3,652 patients across 20 centers in eight countries. Using a leave-one-center-out cross-validation scheme, for each center in turn, we trained a model using data from the remaining centers. The models demonstrated robust performance across centers, ultrasound systems, histological diagnoses and patient age groups, significantly outperforming both expert and non-expert examiners on all evaluated metrics, namely F1 score, sensitivity, specificity, accuracy, Cohen's kappa, Matthew's correlation coefficient, diagnostic odds ratio and Youden's J statistic. Furthermore, in a retrospective triage simulation, artificial intelligence (AI)-driven diagnostic support reduced referrals to experts by 63% while significantly surpassing the diagnostic performance of the current practice. These results show that transformer-based models exhibit strong generalization and above human expert-level diagnostic accuracy, with the potential to alleviate the shortage of expert ultrasound examiners and improve patient outcomes.
卵巢病变很常见,且常为偶然发现。专家超声检查人员的严重短缺引发了对不必要干预和癌症诊断延误的担忧。深度学习在超声图像中检测卵巢癌方面已显示出有前景的结果;然而,尚缺乏外部验证。在这项国际多中心回顾性研究中,我们使用来自八个国家20个中心的3652例患者的17119幅超声图像的综合数据集,开发并验证了基于Transformer的神经网络模型。采用留一中心交叉验证方案,依次对每个中心,我们使用其余中心的数据训练一个模型。这些模型在各中心、超声系统、组织学诊断和患者年龄组中均表现出稳健的性能,在所有评估指标(即F1分数、灵敏度、特异度、准确度、科恩kappa系数、马修斯相关系数、诊断比值比和约登指数)上均显著优于专家和非专家检查人员。此外,在一项回顾性分诊模拟中,人工智能驱动的诊断支持将转诊给专家的比例降低了63%,同时显著超过了当前做法的诊断性能。这些结果表明,基于Transformer的模型具有强大的泛化能力且诊断准确性高于人类专家水平,有潜力缓解专家超声检查人员短缺的问题并改善患者预后。