Bashir Zahra, Lin Manxi, Feragen Aasa, Mikolaj Kamil, Taksøe-Vester Caroline, Christensen Anders Nymark, Svendsen Morten B S, Fabricius Mette Hvilshøj, Andreasen Lisbeth, Nielsen Mads, Tolsgaard Martin Grønnebæk
Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
Department of Obstetrics and Gynecology, Slagelse Hospital, Fælledvej 11, 4200, Slagelse, Denmark.
Sci Rep. 2025 Jan 15;15(1):2074. doi: 10.1038/s41598-025-86536-4.
We aimed to develop and evaluate Explainable Artificial Intelligence (XAI) for fetal ultrasound using actionable concepts as feedback to end-users, using a prospective cross-center, multi-level approach. We developed, implemented, and tested a deep-learning model for fetal growth scans using both retrospective and prospective data. We used a modified Progressive Concept Bottleneck Model with pre-established clinical concepts as explanations (feedback on image optimization and presence of anatomical landmarks) as well as segmentations (outlining anatomical landmarks). The model was evaluated prospectively by assessing the following: the model's ability to assess standard plane quality, the correctness of explanations, the clinical usefulness of explanations, and the model's ability to discriminate between different levels of expertise among clinicians. We used 9352 annotated images for model development and 100 videos for prospective evaluation. Overall classification accuracy was 96.3%. The model's performance in assessing standard plane quality was on par with that of clinicians. Agreement between model segmentations and explanations provided by expert clinicians was found in 83.3% and 74.2% of cases, respectively. A panel of clinicians evaluated segmentations as useful in 72.4% of cases and explanations as useful in 75.0% of cases. Finally, the model reliably discriminated between the performances of clinicians with different levels of experience (p- values < 0.01 for all measures) Our study has successfully developed an Explainable AI model for real-time feedback to clinicians performing fetal growth scans. This work contributes to the existing literature by addressing the gap in the clinical validation of Explainable AI models within fetal medicine, emphasizing the importance of multi-level, cross-institutional, and prospective evaluation with clinician end-users. The prospective clinical validation uncovered challenges and opportunities that could not have been anticipated if we had only focused on retrospective development and validation, such as leveraging AI to gauge operator competence in fetal ultrasound.
我们旨在以前瞻性跨中心、多层次的方法,开发并评估用于胎儿超声的可解释人工智能(XAI),使用可操作的概念作为对终端用户的反馈。我们使用回顾性和前瞻性数据,开发、实施并测试了用于胎儿生长扫描的深度学习模型。我们使用了一种经过修改的渐进式概念瓶颈模型,将预先确立的临床概念作为解释(关于图像优化和解剖标志存在情况的反馈)以及分割(勾勒解剖标志)。通过评估以下方面对该模型进行前瞻性评估:模型评估标准平面质量的能力、解释的正确性、解释的临床实用性,以及模型区分临床医生不同专业水平的能力。我们使用9352张标注图像进行模型开发,并使用100个视频进行前瞻性评估。总体分类准确率为96.3%。该模型在评估标准平面质量方面的表现与临床医生相当。分别在83.3%和74.2%的病例中发现模型分割与专家临床医生提供的解释之间存在一致性。一组临床医生评估分割在72.4%的病例中有用,解释在75.0%的病例中有用。最后,该模型能够可靠地区分不同经验水平临床医生的表现(所有测量的p值均<0.01)。我们的研究成功开发了一种可解释人工智能模型,用于为进行胎儿生长扫描的临床医生提供实时反馈。这项工作通过填补胎儿医学中可解释人工智能模型临床验证方面的空白,为现有文献做出了贡献,强调了与临床医生终端用户进行多层次、跨机构和前瞻性评估的重要性。前瞻性临床验证揭示了一些挑战和机遇,如果我们仅专注于回顾性开发和验证是无法预见的,例如利用人工智能来评估胎儿超声检查中操作者的能力。