Suppr超能文献

一种用于乳腺癌组织病理学图像分类的基于自步学习的半监督学习方案。

A Semisupervised Learning Scheme with Self-Paced Learning for Classifying Breast Cancer Histopathological Images.

作者信息

Asare Sarpong Kwadwo, You Fei, Nartey Obed Tettey

机构信息

School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

Comput Intell Neurosci. 2020 Dec 3;2020:8826568. doi: 10.1155/2020/8826568. eCollection 2020.

Abstract

The unavailability of large amounts of well-labeled data poses a significant challenge in many medical imaging tasks. Even in the likelihood of having access to sufficient data, the process of accurately labeling the data is an arduous and time-consuming one, requiring expertise skills. Again, the issue of unbalanced data further compounds the abovementioned problems and presents a considerable challenge for many machine learning algorithms. In lieu of this, the ability to develop algorithms that can exploit large amounts of unlabeled data together with a small amount of labeled data, while demonstrating robustness to data imbalance, can offer promising prospects in building highly efficient classifiers. This work proposes a semisupervised learning method that integrates self-training and self-paced learning to generate and select pseudolabeled samples for classifying breast cancer histopathological images. A novel pseudolabel generation and selection algorithm is introduced in the learning scheme to generate and select highly confident pseudolabeled samples from both well-represented classes to less-represented classes. Such a learning approach improves the performance by jointly learning a model and optimizing the generation of pseudolabels on unlabeled-target data to augment the training data and retraining the model with the generated labels. A class balancing framework that normalizes the class-wise confidence scores is also proposed to prevent the model from ignoring samples from less represented classes (hard-to-learn samples), hence effectively handling the issue of data imbalance. Extensive experimental evaluation of the proposed method on the BreakHis dataset demonstrates the effectiveness of the proposed method.

摘要

在许多医学成像任务中,无法获取大量标注良好的数据构成了重大挑战。即使有可能获得足够的数据,准确标注数据的过程也是艰巨且耗时的,需要专业技能。此外,数据不平衡问题进一步加剧了上述问题,给许多机器学习算法带来了相当大的挑战。鉴于此,开发能够利用大量未标注数据和少量标注数据,同时对数据不平衡具有鲁棒性的算法,在构建高效分类器方面可能会带来有前景的成果。这项工作提出了一种半监督学习方法,该方法集成了自训练和自步学习,用于生成和选择伪标注样本以对乳腺癌组织病理学图像进行分类。在学习方案中引入了一种新颖的伪标注生成和选择算法,以便从代表性好的类别到代表性差的类别生成和选择高度可信的伪标注样本。这种学习方法通过联合学习模型并优化未标注目标数据上的伪标注生成来增强训练数据,并使用生成的标签对模型进行重新训练,从而提高性能。还提出了一个对类别置信度分数进行归一化的类别平衡框架,以防止模型忽略来自代表性差的类别的样本(难学习样本),从而有效处理数据不平衡问题。在BreakHis数据集上对所提出方法进行的广泛实验评估证明了该方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/055a/7738795/cbca63a0bb9b/CIN2020-8826568.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验