Suppr超能文献

一种基于聚类后标记的半监督学习方法在病理图像分类中的应用。

A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.

机构信息

Medical Biophysics, University of Toronto, Toronto, Canada.

Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada.

出版信息

Sci Rep. 2018 May 8;8(1):7193. doi: 10.1038/s41598-018-24876-0.

Abstract

Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.

摘要

完全标记的病理学数据集通常难以获取且耗时较长。半监督学习 (SSL) 方法能够借助大量未标记的数据点,从更少的标记数据点中进行学习。在本文中,我们研究了使用聚类分析来识别 SSL 中数据空间潜在结构的可能性。提出了一种聚类-标记方法来识别数据空间中的高密度区域,然后使用这些区域来帮助有监督的 SVM 找到决策边界。我们使用两种不同的分类任务,将我们的方法与其他监督和半监督的最新技术进行了比较,这些技术应用于乳腺病理学数据集。我们发现,与其他先进的监督和半监督方法相比,当可用的标记数据实例数量有限时,我们的 SSL 方法能够提高分类性能。我们还表明,在应用 SSL 技术之前,检查数据空间的底层分布很重要,以确保数据不会违反半监督学习的假设。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0499/5940864/2a1dbf84f369/41598_2018_24876_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验