Hansen Colin B, Nath Vishwesh, Gao Riqiang, Bermudez Camilo, Huo Yuankai, Sandler Kim L, Massion Pierre P, Blume Jeffrey D, Lasko Thomas A, Landman Bennett A
Computer Science, Vanderbilt University, Nashville, TN 37235, USA.
Vanderbilt University Medical Center, Nashville, TN 37235, USA.
Lect Notes Monogr Ser. 2020;12446:112-121. Epub 2020 Oct 2.
Semi-supervised methods have an increasing impact on computer vision tasks to make use of scarce labels on large datasets, yet these approaches have not been well translated to medical imaging. Of particular interest, the MixMatch method achieves significant performance improvement over popular semi-supervised learning methods with scarce labels in the CIFAR-10 dataset. In a complementary approach, Nullspace Tuning on equivalence classes offers the potential to leverage multiple subject scans when the ground truth for the subject is unknown. This work is the first to (1) explore MixMatch with Nullspace Tuning in the context of medical imaging and (2) characterize the impacts of the methods with diminishing labels. We consider two distinct medical imaging domains: skin lesion diagnosis and lung cancer prediction. In both cases we evaluate models trained with diminishing labeled data using supervised, MixMatch, and Nullspace Tuning methods as well as MixMatch with Nullspace Tuning together. MixMatch with Nullspace Tuning together is able to achieve an AUC of 0.755 in lung cancer diagnosis with only 200 labeled subjects on the National Lung Screening Trial and a balanced multi-class accuracy of 77% with only 779 labeled examples on HAM10000. This performance is similar to that of the fully supervised methods when all labels are available. In advancing data driven methods in medical imaging, it is important to consider the use of current state-of-the-art semi-supervised learning methods from the greater machine learning community and their impact on the limitations of data acquisition and annotation.
半监督方法在利用大型数据集中稀缺标签的计算机视觉任务中发挥着越来越大的作用,但这些方法尚未很好地应用于医学成像领域。特别值得关注的是,MixMatch方法在CIFAR-10数据集中,相较于流行的带有稀缺标签的半监督学习方法,实现了显著的性能提升。作为一种补充方法,当个体的真实情况未知时,基于等价类的零空间调整提供了利用多个个体扫描数据的潜力。这项工作首次(1)在医学成像背景下探索结合零空间调整的MixMatch方法,以及(2)刻画标签数量减少时这些方法的影响。我们考虑两个不同的医学成像领域:皮肤病变诊断和肺癌预测。在这两种情况下,我们评估使用监督学习、MixMatch方法、零空间调整方法以及结合零空间调整的MixMatch方法训练的模型,这些模型使用的标记数据逐渐减少。结合零空间调整的MixMatch方法在国家肺癌筛查试验中,仅用200个标记个体就能在肺癌诊断中实现0.755的AUC,在HAM10000数据集上,仅用779个标记示例就能实现77%的平衡多类准确率。当所有标签都可用时,这种性能与完全监督方法相似。在推进医学成像中的数据驱动方法时,重要的是考虑采用来自更广泛机器学习社区的当前最先进的半监督学习方法,以及它们对数据采集和标注局限性的影响。