在多名阅片者使用深度神经网络描绘前列腺腺体解剖结构方面的挑战。

Challenges in Using Deep Neural Networks Across Multiple Readers in Delineating Prostate Gland Anatomy.

作者信息

Abudalou Shatha, Choi Jung, Gage Kenneth, Pow-Sang Julio, Yilmaz Yasin, Balagurunathan Yoganand

机构信息

Department of Machine Learning, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.

Department of Electrical Engineering, University of South Florida, Tampa, FL, USA.

出版信息

J Imaging Inform Med. 2025 May 20. doi: 10.1007/s10278-025-01504-8.

DOI:10.1007/s10278-025-01504-8

PMID:40392414

Abstract

Deep learning methods provide enormous promise in automating manually intense tasks such as medical image segmentation and provide workflow assistance to clinical experts. Deep neural networks (DNN) require a significant amount of training examples and a variety of expert opinions to capture the nuances and the context, a challenging proposition in oncological studies (H. Wang et al., Nature, vol. 620, no. 7972, pp. 47-60, Aug 2023). Inter-reader variability among clinical experts is a real-world problem that severely impacts the generalization of DNN reproducibility. This study proposes quantifying the variability in DNN performance using expert opinions and exploring strategies to train the network and adapt between expert opinions. We address the inter-reader variability problem in the context of prostate gland segmentation using a well-studied DNN, the 3D U-Net model. Reference data includes magnetic resonance imaging (MRI, T2-weighted) with prostate glandular anatomy annotations from two expert readers (R#1, n = 342 and R#2, n = 204). 3D U-Net was trained and tested with individual expert examples (R#1 and R#2) and had an average Dice coefficient of 0.825 (CI, [0.81 0.84]) and 0.85 (CI, [0.82 0.88]), respectively. Combined training with a representative cohort proportion (R#1, n = 100 and R#2, n = 150) yielded enhanced model reproducibility across readers, achieving an average test Dice coefficient of 0.863 (CI, [0.85 0.87]) for R#1 and 0.869 (CI, [0.87 0.88]) for R#2. We re-evaluated the model performance across the gland volumes (large, small) and found improved performance for large gland size with an average Dice coefficient to be at 0.846 [CI, 0.82 0.87] and 0.872 [CI, 0.86 0.89] for R#1 and R#2, respectively, estimated using fivefold cross-validation. Performance for small gland sizes diminished with average Dice of 0.8 [0.79, 0.82] and 0.8 [0.79, 0.83] for R#1 and R#2, respectively.

摘要

深度学习方法在自动化诸如医学图像分割等人工密集型任务方面具有巨大潜力，并为临床专家提供工作流程辅助。深度神经网络（DNN）需要大量的训练示例和各种专家意见来捕捉细微差别和背景信息，这在肿瘤学研究中是一个具有挑战性的命题（H. Wang等人，《自然》，第620卷，第7972期，第47 - 60页，2023年8月）。临床专家之间的阅片者变异性是一个现实世界中的问题，严重影响DNN可重复性的推广。本研究提出使用专家意见量化DNN性能的变异性，并探索训练网络和在专家意见之间进行适配的策略。我们在前列腺分割的背景下，使用经过充分研究的DNN即3D U-Net模型来解决阅片者变异性问题。参考数据包括来自两位专家阅片者（R#1，n = 342；R#2，n = 204）的带有前列腺腺体解剖注释的磁共振成像（MRI，T2加权）。3D U-Net使用各个专家的示例（R#1和R#2）进行训练和测试，其平均Dice系数分别为0.825（CI，[0.81, 0.84]）和0.85（CI，[0.82, 0.88]）。使用代表性队列比例（R#1，n = 100；R#2，n = 150）进行联合训练，提高了跨阅片者的模型可重复性，R#1的平均测试Dice系数达到0.863（CI，[0.85, 0.87]），R#2的平均测试Dice系数达到0.869（CI，[0.87, 0.88]）。我们重新评估了整个腺体体积（大、小）的模型性能，发现对于大腺体尺寸性能有所提高，使用五折交叉验证估计，R#1和R#2的平均Dice系数分别为0.846 [CI，0.82, 0.87]和0.872 [CI，0.86, 0.89]。对于小腺体尺寸，性能有所下降，R#1和R#2的平均Dice分别为0.8 [0.79, 0.82]和0.8 [0.79, 0.83]。