Suppr超能文献

半监督学习在哥斯达黎加当地诊所的乳房 X 光分类中的实际应用案例。

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica.

机构信息

Centre for Computational Intelligence (CCI), De Montfort University, Leicester, UK.

PARMA Research Group, Instituto Tecnológico de Costa Rica, Cartago, Costa Rica.

出版信息

Med Biol Eng Comput. 2022 Apr;60(4):1159-1175. doi: 10.1007/s11517-021-02497-6. Epub 2022 Mar 3.

Abstract

The implementation of deep learning-based computer-aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labelled images, which can be expensive to obtain as time and effort from clinical practitioners are required. To address this, a number of publicly available datasets have been built with data from different hospitals and clinics, which can be used to pre-train the model. However, using models trained on these datasets for later transfer learning and model fine-tuning with images sampled from a different hospital or clinic might result in lower performance. This is due to the distribution mismatch of the datasets, which include different patient populations and image acquisition protocols. In this work, a real-world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data. The use of two popular and publicly available datasets (INbreast and CBIS-DDSM) as source data, to train and test the models on the novel target dataset, is evaluated. A common approach to further improve the model's performance under such small labelled target dataset setting is data augmentation. However, often cheaper unlabelled data is available from the target clinic. Therefore, semi-supervised deep learning, which leverages both labelled and unlabelled data, can be used in such conditions. In this work, we evaluate the semi-supervised deep learning approach known as MixMatch, to take advantage of unlabelled data from the target dataset, for whole mammogram image classification. We compare the usage of semi-supervised learning on its own, and combined with transfer learning (from a source mammogram dataset) with data augmentation, as also against regular supervised learning with transfer learning and data augmentation from source datasets. It is shown that the use of a semi-supervised deep learning combined with transfer learning and data augmentation can provide a meaningful advantage when using scarce labelled observations. Also, we found a strong influence of the source dataset, which suggests a more data-centric approach needed to tackle the challenge of scarcely labelled data. We used several different metrics to assess the performance gain of using semi-supervised learning, when dealing with very imbalanced test datasets (such as the G-mean and the F2-score), as mammogram datasets are often very imbalanced. Graphical Abstract Description of the test-bed implemented in this work. Two different source data distributions were used to fine-tune the different models tested in this work. The target dataset is the in-house CR-Chavarria-2020 dataset.

摘要

基于深度学习的计算机辅助诊断系统在乳腺 X 光图像分类中的应用,可以帮助提高诊断患者的准确性、可靠性和成本。然而,训练深度学习模型需要大量的标记图像,这在从临床医生那里获取图像时既费时又费力。为了解决这个问题,已经构建了许多公开可用的数据集,这些数据集的数据来自不同的医院和诊所,可以用于预训练模型。然而,使用这些数据集训练的模型进行后续的迁移学习和模型微调,以及从不同的医院或诊所采集的图像进行模型微调,可能会导致性能下降。这是由于数据集的分布不匹配,这些数据集包括不同的患者群体和图像采集协议。在这项工作中,评估了一个真实的场景,即使用来自一个私人哥斯达黎加诊所的新目标数据集,该数据集样本数量较少,且数据严重不平衡。评估了使用两个流行的公开数据集(INbreast 和 CBIS-DDSM)作为源数据,对新目标数据集进行训练和测试模型的效果。在这种标记目标数据集较少的情况下,进一步提高模型性能的常用方法是数据增强。然而,通常可以从目标诊所获得更便宜的未标记数据。因此,可以在这种情况下使用利用有标记和无标记数据的半监督深度学习。在这项工作中,我们评估了半监督深度学习方法 MixMatch,以利用目标数据集中的未标记数据进行全乳腺 X 光图像分类。我们比较了单独使用半监督学习、与从源乳腺 X 光数据集进行迁移学习(Transfer Learning)和数据增强相结合的效果,以及与从源数据集进行常规监督学习和数据增强的效果。结果表明,在使用稀缺标记观测值时,使用半监督深度学习与迁移学习和数据增强相结合可以提供有意义的优势。此外,我们发现源数据集的影响很大,这表明需要更注重数据的方法来解决稀缺标记数据的挑战。我们使用了几种不同的指标来评估在处理非常不平衡的测试数据集(如 G-mean 和 F2-score)时使用半监督学习的性能增益,因为乳腺 X 光数据集通常非常不平衡。

描述本工作中实现的测试床的图形抽象。使用了两个不同的源数据分布来微调本工作中测试的不同模型。目标数据集是内部的 CR-Chavarria-2020 数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c255/8892413/c4bd5cb1cae4/11517_2021_2497_Figa_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验