Hooper Sarah M, Wu Sen, Davies Rhodri H, Bhuva Anish, Schelbert Erik B, Moon James C, Kellman Peter, Xue Hui, Langlotz Curtis, Ré Christopher
Stanford University, Department of Electrical Engineering, Stanford, California, United States.
Stanford University, Department of Computer Science, Stanford, California, United States.
J Med Imaging (Bellingham). 2023 Mar;10(2):024007. doi: 10.1117/1.JMI.10.2.024007. Epub 2023 Mar 30.
Neural networks have potential to automate medical image segmentation but require expensive labeling efforts. While methods have been proposed to reduce the labeling burden, most have not been thoroughly evaluated on large, clinical datasets or clinical tasks. We propose a method to train segmentation networks with limited labeled data and focus on thorough network evaluation.
We propose a semi-supervised method that leverages data augmentation, consistency regularization, and pseudolabeling and train four cardiac magnetic resonance (MR) segmentation networks. We evaluate the models on multiinstitutional, multiscanner, multidisease cardiac MR datasets using five cardiac functional biomarkers, which are compared to an expert's measurements using Lin's concordance correlation coefficient (CCC), the within-subject coefficient of variation (CV), and the Dice coefficient.
The semi-supervised networks achieve strong agreement using Lin's CCC ( ), CV similar to an expert, and strong generalization performance. We compare the error modes of the semi-supervised networks against fully supervised networks. We evaluate semi-supervised model performance as a function of labeled training data and with different types of model supervision, showing that a model trained with 100 labeled image slices can achieve a Dice coefficient within 1.10% of a network trained with 16,000+ labeled image slices.
We evaluate semi-supervision for medical image segmentation using heterogeneous datasets and clinical metrics. As methods for training models with little labeled data become more common, knowledge about how they perform on clinical tasks, how they fail, and how they perform with different amounts of labeled data is useful to model developers and users.
神经网络有潜力实现医学图像分割的自动化,但需要高昂的标注工作。虽然已经提出了一些方法来减轻标注负担,但大多数方法尚未在大型临床数据集或临床任务上进行全面评估。我们提出了一种使用有限标注数据训练分割网络的方法,并专注于全面的网络评估。
我们提出了一种半监督方法,该方法利用数据增强、一致性正则化和伪标签,并训练四个心脏磁共振(MR)分割网络。我们使用五个心脏功能生物标志物在多机构、多扫描仪、多疾病的心脏MR数据集上评估模型,并使用林氏一致性相关系数(CCC)、受试者内变异系数(CV)和骰子系数将其与专家测量结果进行比较。
半监督网络使用林氏CCC( )达成了高度一致,CV与专家相近,并且具有很强的泛化性能。我们将半监督网络的错误模式与完全监督网络进行了比较。我们评估了半监督模型性能作为标注训练数据的函数以及不同类型的模型监督的函数,结果表明,使用100个标注图像切片训练的模型可以达到与使用16000多个标注图像切片训练的网络相差1.10%以内的骰子系数。
我们使用异构数据集和临床指标评估了医学图像分割的半监督方法。随着使用少量标注数据训练模型的方法变得越来越普遍,了解它们在临床任务上的表现、失败方式以及在不同数量标注数据下的表现,对模型开发者和用户都很有用。