Suppr超能文献

从视觉Transformer 集成中提取知识,提高乳腺超声分类的性能。

Distilling Knowledge From an Ensemble of Vision Transformers for Improved Classification of Breast Ultrasound.

机构信息

Weill Cornell Medicine, New York, NY 10021.

Dalio Institute of Cardiovascular Imaging, Department of Radiology, Weill Cornell Medicine, New York, New York.

出版信息

Acad Radiol. 2024 Jan;31(1):104-120. doi: 10.1016/j.acra.2023.08.006. Epub 2023 Sep 2.

Abstract

RATIONALE AND OBJECTIVES

To develop a deep learning model for the automated classification of breast ultrasound images as benign or malignant. More specifically, the application of vision transformers, ensemble learning, and knowledge distillation is explored for breast ultrasound classification.

MATERIALS AND METHODS

Single view, B-mode ultrasound images were curated from the publicly available Breast Ultrasound Image (BUSI) dataset, which has categorical ground truth labels (benign vs malignant) assigned by radiologists and malignant cases confirmed by biopsy. The performance of vision transformers (ViT) is compared to convolutional neural networks (CNN), followed by a comparison between supervised, self-supervised, and randomly initialized ViT. Subsequently, the ensemble of 10 independently trained ViT, where the ensemble model is the unweighted average of the output of each individual model is compared to the performance of each ViT alone. Finally, we train a single ViT to emulate the ensembled ViT using knowledge distillation.

RESULTS

On this dataset that was trained using five-fold cross validation, ViT outperforms CNN, while self-supervised ViT outperform supervised and randomly initialized ViT. The ensemble model achieves an area under the receiver operating characteristics curve (AuROC) and area under the precision recall curve (AuPRC) of 0.977 and 0.965 on the test set, outperforming the average AuROC and AuPRC of the independently trained ViTs (0.958 ± 0.05 and 0.931 ± 0.016). The distilled ViT achieves an AuROC and AuPRC of 0.972 and 0.960.

CONCLUSION

Both transfer learning and ensemble learning can each offer increased performance independently and can be sequentially combined to collectively improve the performance of the final model. Furthermore, a single vision transformer can be trained to match the performance of an ensemble of a set of vision transformers using knowledge distillation.

摘要

背景与目的

开发一种深度学习模型,用于自动对乳腺超声图像进行良性或恶性分类。更具体地说,探索了视觉转换器、集成学习和知识蒸馏在乳腺超声分类中的应用。

材料与方法

从公开的乳腺超声图像(BUSI)数据集获取单视图、B 模式超声图像,该数据集具有由放射科医生分配的类别真实标签(良性与恶性),并且通过活检确认恶性病例。比较了视觉转换器(ViT)与卷积神经网络(CNN)的性能,然后比较了监督、自监督和随机初始化 ViT。随后,将 10 个独立训练的 ViT 进行集成,其中集成模型是每个单独模型输出的无权重平均值,与每个 ViT 的性能进行比较。最后,我们使用知识蒸馏训练单个 ViT 来模拟集成的 ViT。

结果

在使用五折交叉验证进行训练的这个数据集上,ViT 优于 CNN,而自监督 ViT 优于监督和随机初始化 ViT。集成模型在测试集上的受试者工作特征曲线下面积(AuROC)和精度召回曲线下面积(AuPRC)分别为 0.977 和 0.965,优于独立训练的 ViT 的平均 AuROC 和 AuPRC(0.958 ± 0.05 和 0.931 ± 0.016)。蒸馏后的 ViT 的 AuROC 和 AuPRC 分别为 0.972 和 0.960。

结论

迁移学习和集成学习都可以独立提高性能,并且可以依次结合以共同提高最终模型的性能。此外,可以使用知识蒸馏训练单个视觉转换器来匹配一组视觉转换器的性能。

相似文献

1
Distilling Knowledge From an Ensemble of Vision Transformers for Improved Classification of Breast Ultrasound.
Acad Radiol. 2024 Jan;31(1):104-120. doi: 10.1016/j.acra.2023.08.006. Epub 2023 Sep 2.
2
BUViTNet: Breast Ultrasound Detection via Vision Transformers.
Diagnostics (Basel). 2022 Nov 1;12(11):2654. doi: 10.3390/diagnostics12112654.
3
A VGG attention vision transformer network for benign and malignant classification of breast ultrasound images.
Med Phys. 2022 Sep;49(9):5787-5798. doi: 10.1002/mp.15852. Epub 2022 Jul 30.
4
Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers.
Front Genet. 2022 Feb 24;13:864724. doi: 10.3389/fgene.2022.864724. eCollection 2022.
6
RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers.
Sensors (Basel). 2022 May 19;22(10):3849. doi: 10.3390/s22103849.
7
Seeking an optimal approach for Computer-aided Diagnosis of Pulmonary Embolism.
Med Image Anal. 2024 Jan;91:102988. doi: 10.1016/j.media.2023.102988. Epub 2023 Oct 13.
8
Vision Transformers for Classification of Breast Ultrasound Images.
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:480-483. doi: 10.1109/EMBC48229.2022.9871809.
9
3D-Vision-Transformer Stacking Ensemble for Assessing Prostate Cancer Aggressiveness from T2w Images.
Bioengineering (Basel). 2023 Aug 28;10(9):1015. doi: 10.3390/bioengineering10091015.
10
DistilIQA: Distilling Vision Transformers for no-reference perceptual CT image quality assessment.
Comput Biol Med. 2024 Jul;177:108670. doi: 10.1016/j.compbiomed.2024.108670. Epub 2024 May 28.

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验