半监督学习在哥斯达黎加当地诊所的乳房 X 光分类中的实际应用案例。

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica.

机构信息

Centre for Computational Intelligence (CCI), De Montfort University, Leicester, UK.

PARMA Research Group, Instituto Tecnológico de Costa Rica, Cartago, Costa Rica.

出版信息

Med Biol Eng Comput. 2022 Apr;60(4):1159-1175. doi: 10.1007/s11517-021-02497-6. Epub 2022 Mar 3.

DOI:10.1007/s11517-021-02497-6

PMID:35239108

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8892413/

Abstract

The implementation of deep learning-based computer-aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labelled images, which can be expensive to obtain as time and effort from clinical practitioners are required. To address this, a number of publicly available datasets have been built with data from different hospitals and clinics, which can be used to pre-train the model. However, using models trained on these datasets for later transfer learning and model fine-tuning with images sampled from a different hospital or clinic might result in lower performance. This is due to the distribution mismatch of the datasets, which include different patient populations and image acquisition protocols. In this work, a real-world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data. The use of two popular and publicly available datasets (INbreast and CBIS-DDSM) as source data, to train and test the models on the novel target dataset, is evaluated. A common approach to further improve the model's performance under such small labelled target dataset setting is data augmentation. However, often cheaper unlabelled data is available from the target clinic. Therefore, semi-supervised deep learning, which leverages both labelled and unlabelled data, can be used in such conditions. In this work, we evaluate the semi-supervised deep learning approach known as MixMatch, to take advantage of unlabelled data from the target dataset, for whole mammogram image classification. We compare the usage of semi-supervised learning on its own, and combined with transfer learning (from a source mammogram dataset) with data augmentation, as also against regular supervised learning with transfer learning and data augmentation from source datasets. It is shown that the use of a semi-supervised deep learning combined with transfer learning and data augmentation can provide a meaningful advantage when using scarce labelled observations. Also, we found a strong influence of the source dataset, which suggests a more data-centric approach needed to tackle the challenge of scarcely labelled data. We used several different metrics to assess the performance gain of using semi-supervised learning, when dealing with very imbalanced test datasets (such as the G-mean and the F2-score), as mammogram datasets are often very imbalanced. Graphical Abstract Description of the test-bed implemented in this work. Two different source data distributions were used to fine-tune the different models tested in this work. The target dataset is the in-house CR-Chavarria-2020 dataset.

摘要

基于深度学习的计算机辅助诊断系统在乳腺 X 光图像分类中的应用，可以帮助提高诊断患者的准确性、可靠性和成本。然而，训练深度学习模型需要大量的标记图像，这在从临床医生那里获取图像时既费时又费力。为了解决这个问题，已经构建了许多公开可用的数据集，这些数据集的数据来自不同的医院和诊所，可以用于预训练模型。然而，使用这些数据集训练的模型进行后续的迁移学习和模型微调，以及从不同的医院或诊所采集的图像进行模型微调，可能会导致性能下降。这是由于数据集的分布不匹配，这些数据集包括不同的患者群体和图像采集协议。在这项工作中，评估了一个真实的场景，即使用来自一个私人哥斯达黎加诊所的新目标数据集，该数据集样本数量较少，且数据严重不平衡。评估了使用两个流行的公开数据集（INbreast 和 CBIS-DDSM）作为源数据，对新目标数据集进行训练和测试模型的效果。在这种标记目标数据集较少的情况下，进一步提高模型性能的常用方法是数据增强。然而，通常可以从目标诊所获得更便宜的未标记数据。因此，可以在这种情况下使用利用有标记和无标记数据的半监督深度学习。在这项工作中，我们评估了半监督深度学习方法 MixMatch，以利用目标数据集中的未标记数据进行全乳腺 X 光图像分类。我们比较了单独使用半监督学习、与从源乳腺 X 光数据集进行迁移学习（Transfer Learning）和数据增强相结合的效果，以及与从源数据集进行常规监督学习和数据增强的效果。结果表明，在使用稀缺标记观测值时，使用半监督深度学习与迁移学习和数据增强相结合可以提供有意义的优势。此外，我们发现源数据集的影响很大，这表明需要更注重数据的方法来解决稀缺标记数据的挑战。我们使用了几种不同的指标来评估在处理非常不平衡的测试数据集（如 G-mean 和 F2-score）时使用半监督学习的性能增益，因为乳腺 X 光数据集通常非常不平衡。

描述本工作中实现的测试床的图形抽象。使用了两个不同的源数据分布来微调本工作中测试的不同模型。目标数据集是内部的 CR-Chavarria-2020 数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c255/8892413/c4bd5cb1cae4/11517_2021_2497_Figa_HTML.jpg

相似文献

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica.半监督学习在哥斯达黎加当地诊所的乳房 X 光分类中的实际应用案例。

Med Biol Eng Comput. 2022 Apr;60(4):1159-1175. doi: 10.1007/s11517-021-02497-6. Epub 2022 Mar 3.

Dealing with distribution mismatch in semi-supervised deep learning for COVID-19 detection using chest X-ray images: A novel approach using feature densities.利用胸部X光图像进行COVID-19检测的半监督深度学习中处理分布不匹配问题：一种使用特征密度的新方法。

Appl Soft Comput. 2022 Jul;123:108983. doi: 10.1016/j.asoc.2022.108983. Epub 2022 May 10.

Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images.利用胸部X光图像校正数据不平衡以进行半监督式COVID-19检测

Appl Soft Comput. 2021 Nov;111:107692. doi: 10.1016/j.asoc.2021.107692. Epub 2021 Jul 13.

Combining weakly and strongly supervised learning improves strong supervision in Gleason pattern classification.弱监督和强监督学习的结合提高了 Gleason 模式分类中的强监督。

BMC Med Imaging. 2021 May 8;21(1):77. doi: 10.1186/s12880-021-00609-0.

Semi-supervised learning for medical image classification using imbalanced training data.基于不平衡训练数据的医学图像分类的半监督学习。

Comput Methods Programs Biomed. 2022 Apr;216:106628. doi: 10.1016/j.cmpb.2022.106628. Epub 2022 Jan 14.

Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: An experiment on prostate histopathology image classification.基于异构数据和少量局部标注的深度卷积神经网络的半监督学习：前列腺组织病理学图像分类实验。

Med Image Anal. 2021 Oct;73:102165. doi: 10.1016/j.media.2021.102165. Epub 2021 Jul 14.

Seeking an optimal approach for Computer-aided Diagnosis of Pulmonary Embolism.寻求肺栓塞计算机辅助诊断的最佳方法。

Med Image Anal. 2024 Jan;91:102988. doi: 10.1016/j.media.2023.102988. Epub 2023 Oct 13.

Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging.基于对比和掩蔽自动编码器方法的自监督预训练在医学影像深度学习中小数据集处理中的应用。

Sci Rep. 2023 Nov 20;13(1):20260. doi: 10.1038/s41598-023-46433-0.

Detection of masses in mammograms using a one-stage object detector based on a deep convolutional neural network.基于深度卷积神经网络的一阶段目标检测器在乳腺 X 线摄影中肿块的检测。

PLoS One. 2018 Sep 18;13(9):e0203355. doi: 10.1371/journal.pone.0203355. eCollection 2018.

Biomedical image classification made easier thanks to transfer and semi-supervised learning.得益于迁移学习和半监督学习，生物医学图像分类变得更加容易。

Comput Methods Programs Biomed. 2021 Jan;198:105782. doi: 10.1016/j.cmpb.2020.105782. Epub 2020 Oct 3.

引用本文的文献

Leveraging Multi-Task Learning to Cope With Poor and Missing Labels of Mammograms.利用多任务学习应对乳房X光照片的不良标签和缺失标签

Front Radiol. 2022 Jan 11;1:796078. doi: 10.3389/fradi.2021.796078. eCollection 2021.

Exploiting Patch Sizes and Resolutions for Multi-Scale Deep Learning in Mammogram Image Classification.利用补丁大小和分辨率进行乳腺X光图像分类中的多尺度深度学习

Bioengineering (Basel). 2023 Apr 27;10(5):534. doi: 10.3390/bioengineering10050534.

Convolutional Networks and Transformers for Mammography Classification: An Experimental Study.卷积神经网络和 Transformer 在乳腺 X 线摄影分类中的应用：一项实验研究。

Sensors (Basel). 2023 Jan 20;23(3):1229. doi: 10.3390/s23031229.

本文引用的文献

Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images.利用胸部X光图像校正数据不平衡以进行半监督式COVID-19检测

Appl Soft Comput. 2021 Nov;111:107692. doi: 10.1016/j.asoc.2021.107692. Epub 2021 Jul 13.

Looking for Abnormalities in Mammograms With Self- and Weakly Supervised Reconstruction.使用自监督和弱监督重建寻找乳腺 X 光片中的异常。

IEEE Trans Med Imaging. 2021 Oct;40(10):2711-2722. doi: 10.1109/TMI.2021.3050040. Epub 2021 Sep 30.

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数（MCC）在二分类评估中优于 F1 得分和准确率的优势。

BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.

Sample-Size Determination Methodologies for Machine Learning in Medical Imaging Research: A Systematic Review.机器学习在医学影像学研究中的样本量确定方法：系统评价。

Can Assoc Radiol J. 2019 Nov;70(4):344-353. doi: 10.1016/j.carj.2019.06.002. Epub 2019 Sep 12.

Deep Learning to Improve Breast Cancer Detection on Screening Mammography.深度学习在提高筛查性乳房 X 光摄影乳腺癌检测中的应用。

Sci Rep. 2019 Aug 29;9(1):12495. doi: 10.1038/s41598-019-48995-4.

Deep convolutional neural networks for mammography: advances, challenges and applications.深度学习卷积神经网络在乳腺 X 线摄影中的应用：进展、挑战和应用。

BMC Bioinformatics. 2019 Jun 6;20(Suppl 11):281. doi: 10.1186/s12859-019-2823-4.

Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis.非监督式学习：医学影像分析中的半监督、多实例和迁移学习综述。

Med Image Anal. 2019 May;54:280-296. doi: 10.1016/j.media.2019.03.009. Epub 2019 Mar 29.

Deep learning in mammography and breast histology, an overview and future trends.深度学习在乳腺 X 线摄影和乳腺组织学中的应用：概述与未来趋势。

Med Image Anal. 2018 Jul;47:45-67. doi: 10.1016/j.media.2018.03.006. Epub 2018 Mar 26.

A curated mammography data set for use in computer-aided detection and diagnosis research.用于计算机辅助检测和诊断研究的精选 mammography 数据集。

Sci Data. 2017 Dec 19;4:170177. doi: 10.1038/sdata.2017.177.

Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data.基于无标记数据增强的深度卷积神经网络方案用于乳腺癌诊断。

Comput Med Imaging Graph. 2017 Apr;57:4-9. doi: 10.1016/j.compmedimag.2016.07.004. Epub 2016 Jul 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

半监督学习在哥斯达黎加当地诊所的乳房 X 光分类中的实际应用案例。

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献