Chen Richard J, Ding Tong, Lu Ming Y, Williamson Drew F K, Jaume Guillaume, Chen Bowen, Zhang Andrew, Shao Daniel, Song Andrew H, Shaban Muhammad, Williams Mane, Vaidya Anurag, Sahai Sharifa, Oldenburg Lukas, Weishaupt Luca L, Wang Judy J, Williams Walt, Le Long Phi, Gerber Georg, Mahmood Faisal
ArXiv. 2023 Aug 29:arXiv:2308.15474v1.
Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts have proposed using pretrained image encoders with either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin-stained WSIs across 20 major tissue types, and evaluated on 33 representative CPath clinical tasks in CPath of varying diagnostic difficulties. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree code classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient AI models that can generalize and transfer to a gamut of diagnostically-challenging tasks and clinical workflows in anatomic pathology.
组织表型分析是解剖病理学中学习组织病理学生物标志物客观特征的一项基本计算病理学(CPath)任务。然而,全切片成像(WSI)带来了一个复杂的计算机视觉问题,其中WSI的大规模图像分辨率和形态表型的巨大多样性使得大规模数据标注难以进行。目前的努力提出使用预训练的图像编码器,要么从自然图像数据集进行迁移学习,要么在公开可用的组织病理学数据集上进行自监督预训练,但尚未在各种组织类型上进行大规模的广泛开发和评估。我们引入了UNI,这是一种用于病理学的通用自监督模型,使用来自20种主要组织类型的超过100,000张诊断性苏木精和伊红染色的WSI中的超过1亿个组织切片进行预训练,并在不同诊断难度的CPath中的33个代表性CPath临床任务上进行评估。除了优于先前的先进模型外,我们还展示了CPath中的新建模能力,如分辨率无关的组织分类、使用少样本类原型的玻片分类以及在OncoTree代码分类系统中对多达108种癌症类型进行分类的疾病亚型泛化。UNI在预训练数据和下游评估方面都推动了CPath中大规模的无监督表示学习,使数据高效的人工智能模型能够推广并转移到解剖病理学中一系列具有诊断挑战性的任务和临床工作流程中。