Noothout Julia M H, Lessmann Nikolas, van Eede Matthijs C, van Harten Louis D, Sogancioglu Ecem, Heslinga Friso G, Veta Mitko, van Ginneken Bram, Išgum Ivana
Amsterdam University Medical Center, University of Amsterdam, Department of Biomedical Engineering and Physics, Amsterdam, The Netherlands.
Radboud University Medical Center, Department of Medical Imaging, Nijmegen, The Netherlands.
J Med Imaging (Bellingham). 2022 Sep;9(5):052407. doi: 10.1117/1.JMI.9.5.052407. Epub 2022 May 28.
Ensembles of convolutional neural networks (CNNs) often outperform a single CNN in medical image segmentation tasks, but inference is computationally more expensive and makes ensembles unattractive for some applications. We compared the performance of differently constructed ensembles with the performance of CNNs derived from these ensembles using knowledge distillation, a technique for reducing the footprint of large models such as ensembles. We investigated two different types of ensembles, namely, diverse ensembles of networks with three different architectures and two different loss-functions, and uniform ensembles of networks with the same architecture but initialized with different random seeds. For each ensemble, additionally, a single student network was trained to mimic the class probabilities predicted by the teacher model, the ensemble. We evaluated the performance of each network, the ensembles, and the corresponding distilled networks across three different publicly available datasets. These included chest computed tomography scans with four annotated organs of interest, brain magnetic resonance imaging (MRI) with six annotated brain structures, and cardiac cine-MRI with three annotated heart structures. Both uniform and diverse ensembles obtained better results than any of the individual networks in the ensemble. Furthermore, applying knowledge distillation resulted in a single network that was smaller and faster without compromising performance compared with the ensemble it learned from. The distilled networks significantly outperformed the same network trained with reference segmentation instead of knowledge distillation. Knowledge distillation can compress segmentation ensembles of uniform or diverse composition into a single CNN while maintaining the performance of the ensemble.
卷积神经网络(CNN)集成在医学图像分割任务中通常比单个CNN表现更好,但推理的计算成本更高,这使得集成在某些应用中缺乏吸引力。我们使用知识蒸馏(一种减少诸如集成等大型模型占用空间的技术),将不同构造的集成的性能与从这些集成中派生的CNN的性能进行了比较。我们研究了两种不同类型的集成,即具有三种不同架构和两种不同损失函数的网络的多样化集成,以及具有相同架构但用不同随机种子初始化的网络的均匀集成。此外,对于每个集成,训练一个单个的学生网络来模仿教师模型(即集成)预测的类别概率。我们在三个不同的公开可用数据集上评估了每个网络、集成以及相应的蒸馏网络的性能。这些数据集包括带有四个感兴趣的注释器官的胸部计算机断层扫描、带有六个注释脑结构的脑磁共振成像(MRI)以及带有三个注释心脏结构的心脏电影MRI。均匀集成和多样化集成都比集成中的任何单个网络获得了更好的结果。此外,应用知识蒸馏得到了一个更小、更快的单个网络,其性能与它所学习的集成相比没有受损。蒸馏网络明显优于使用参考分割而不是知识蒸馏训练的相同网络。知识蒸馏可以将均匀或多样化组成的分割集成压缩成单个CNN,同时保持集成的性能。