Division of Anatomical Pathology, School of Pathology, University of the Witwatersrand, Johannesburg, South Africa.
National Health Laboratory Service, Johannesburg, South Africa.
Am J Clin Pathol. 2022 Jan 6;157(1):5-14. doi: 10.1093/ajcp/aqab085.
OBJECTIVES: Developing accurate supervised machine learning algorithms is hampered by the lack of representative annotated datasets. Most data in anatomic pathology are unlabeled and creating large, annotated datasets is a time consuming and laborious process. Unsupervised learning, which does not require annotated data, possesses the potential to assist with this challenge. This review aims to introduce the concept of unsupervised learning and illustrate how clustering, generative adversarial networks (GANs) and autoencoders have the potential to address the lack of annotated data in anatomic pathology. METHODS: A review of unsupervised learning with examples from the literature was carried out. RESULTS: Clustering can be used as part of semisupervised learning where labels are propagated from a subset of annotated data points to remaining unlabeled data points in a dataset. GANs may assist by generating large amounts of synthetic data and performing color normalization. Autoencoders allow training of a network on a large, unlabeled dataset and transferring learned representations to a classifier using a smaller, labeled subset (unsupervised pretraining). CONCLUSIONS: Unsupervised machine learning techniques such as clustering, GANs, and autoencoders, used individually or in combination, may help address the lack of annotated data in pathology and improve the process of developing supervised learning models.
目的:由于缺乏代表性的标注数据集,开发准确的监督机器学习算法受到阻碍。解剖病理学中的大多数数据都是未标记的,创建大型标注数据集是一个耗时且费力的过程。不需要标注数据的无监督学习有可能有助于解决这一挑战。本文旨在介绍无监督学习的概念,并举例说明聚类、生成对抗网络 (GAN) 和自动编码器如何有潜力解决解剖病理学中缺乏标注数据的问题。
方法:对无监督学习进行了文献回顾,并举例说明。
结果:聚类可作为半监督学习的一部分,其中标签从一组标注数据点传播到数据集的其余未标注数据点。GAN 可通过生成大量合成数据并进行颜色归一化来辅助。自动编码器允许在大型未标注数据集上训练网络,并使用较小的标注子集(无监督预训练)将学习到的表示传递给分类器。
结论:聚类、GAN 和自动编码器等无监督机器学习技术单独或组合使用,可能有助于解决病理学中缺乏标注数据的问题,并改进开发监督学习模型的过程。
Am J Clin Pathol. 2022-1-6
IEEE Trans Pattern Anal Mach Intell. 2022-4
Brief Bioinform. 2021-3-22
Rep Prog Phys. 2021-12-7
Curr Med Imaging. 2021
Med Image Anal. 2019-6-12
BMC Med Inform Decis Mak. 2023-9-28
Cell Rep Methods. 2025-6-16
J Pathol Clin Res. 2023-7
J Pathol Inform. 2023-3-12