BUS-Set：一个用于使用公共数据集对乳腺超声分割网络进行定量评估的基准。

BUS-Set: A benchmark for quantitative evaluation of breast ultrasound segmentation networks with public datasets.

作者信息

Thomas Cory, Byra Michal, Marti Robert, Yap Moi Hoon, Zwiggelaar Reyer

机构信息

Department of Computer Science, Aberystwyth University, Aberystwyth, UK.

Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, Poland.

出版信息

Med Phys. 2023 May;50(5):3223-3243. doi: 10.1002/mp.16287. Epub 2023 Feb 28.

DOI:10.1002/mp.16287

PMID:36794706

Abstract

PURPOSE

BUS-Set is a reproducible benchmark for breast ultrasound (BUS) lesion segmentation, comprising of publicly available images with the aim of improving future comparisons between machine learning models within the field of BUS.

METHOD

Four publicly available datasets were compiled creating an overall set of 1154 BUS images, from five different scanner types. Full dataset details have been provided, which include clinical labels and detailed annotations. Furthermore, nine state-of-the-art deep learning architectures were selected to form the initial benchmark segmentation result, tested using five-fold cross-validation and MANOVA/ANOVA with Tukey statistical significance test with a threshold of 0.01. Additional evaluation of these architectures was conducted, exploring possible training bias, and lesion size and type effects.

RESULTS

Of the nine state-of-the-art benchmarked architectures, Mask R-CNN obtained the highest overall results, with the following mean metric scores: Dice score of 0.851, intersection over union of 0.786 and pixel accuracy of 0.975. MANOVA/ANOVA and Tukey test results showed Mask R-CNN to be statistically significant better compared to all other benchmarked models with a p-value >0.01. Moreover, Mask R-CNN achieved the highest mean Dice score of 0.839 on an additional 16 image dataset, that contained multiple lesions per image. Further analysis on regions of interest was conducted, assessing Hamming distance, depth-to-width ratio (DWR), circularity, and elongation, which showed that the Mask R-CNN's segmentations maintained the most morphological features with correlation coefficients of 0.888, 0.532, 0.876 for DWR, circularity, and elongation, respectively. Based on the correlation coefficients, statistical test indicated that Mask R-CNN was only significantly different to Sk-U-Net.

CONCLUSIONS

BUS-Set is a fully reproducible benchmark for BUS lesion segmentation obtained through the use of public datasets and GitHub. Of the state-of-the-art convolution neural network (CNN)-based architectures, Mask R-CNN achieved the highest performance overall, further analysis indicated that a training bias may have occurred due to the lesion size variation in the dataset. All dataset and architecture details are available at GitHub: https://github.com/corcor27/BUS-Set, which allows for a fully reproducible benchmark.

摘要

目的

BUS-Set是一种用于乳腺超声（BUS）病变分割的可重复基准，由公开可用的图像组成，旨在改善BUS领域内机器学习模型之间未来的比较。

方法

汇编了四个公开可用的数据集，创建了一个包含1154张BUS图像的总体集合，这些图像来自五种不同的扫描仪类型。提供了完整的数据集详细信息，包括临床标签和详细注释。此外，选择了九种最先进的深度学习架构来形成初始基准分割结果，使用五折交叉验证和带有Tukey统计显著性检验（阈值为0.01）的多变量方差分析/方差分析进行测试。对这些架构进行了额外评估，探讨了可能的训练偏差以及病变大小和类型的影响。

结果

在九种最先进的基准架构中，Mask R-CNN获得了最高的总体结果，其平均指标得分如下：Dice分数为0.851，交并比为0.786，像素准确率为0.975。多变量方差分析/方差分析和Tukey检验结果表明，与所有其他基准模型相比，Mask R-CNN在统计学上显著更好，p值>0.01。此外，Mask R-CNN在另外一个包含16个图像数据集（每个图像包含多个病变）上获得了最高平均Dice分数0.839。对感兴趣区域进行了进一步分析，评估了汉明距离、深宽比（DWR）、圆形度和伸长率，结果表明Mask R-CNN的分割在形态特征方面保持得最好，DWR、圆形度和伸长率的相关系数分别为0.888、0.532、0.876。基于相关系数的统计检验表明，Mask R-CNN仅与Sk-U-Net有显著差异。

结论

BUS-Set是通过使用公共数据集和GitHub获得的用于BUS病变分割的完全可重复基准。在基于最先进卷积神经网络（CNN）的架构中，Mask R-CNN总体性能最高，进一步分析表明，由于数据集中病变大小的变化，可能出现了训练偏差。所有数据集和架构详细信息可在GitHub上获取：https://github.com/corcor27/BUS-Set，这允许进行完全可重复的基准测试。