Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
Nat Med. 2024 Mar;30(3):850-862. doi: 10.1038/s41591-024-02857-3. Epub 2024 Mar 19.
Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks, requiring the objective characterization of histopathological entities from whole-slide images (WSIs). The high resolution of WSIs and the variability of morphological features present significant challenges, complicating the large-scale annotation of data for high-performance applications. To address this challenge, current efforts have proposed the use of pretrained image encoders through transfer learning from natural image datasets or self-supervised learning on publicly available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data) across 20 major tissue types. The model was evaluated on 34 representative CPath tasks of varying diagnostic difficulty. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient artificial intelligence models that can generalize and transfer to a wide range of diagnostically challenging tasks and clinical workflows in anatomic pathology.
组织图像的定量评估对于计算病理学(CPath)任务至关重要,需要从全切片图像(WSI)中客观地描述组织病理学实体。WSI 的高分辨率和形态特征的可变性带来了重大挑战,使得大规模注释数据以用于高性能应用变得复杂。为了解决这一挑战,目前的研究工作提出了使用经过预训练的图像编码器,通过从自然图像数据集进行迁移学习或在公开的组织病理学数据集上进行自我监督学习,但是这些方法尚未在不同的组织类型上进行广泛开发和评估。我们引入了 UNI,这是一种通用的病理学自我监督模型,使用来自超过 100,000 个诊断性 H&E 染色 WSI(超过 77TB 的数据)的超过 1 亿张图像进行预训练,涵盖了 20 种主要组织类型。该模型在 34 个具有不同诊断难度的代表性 CPath 任务上进行了评估。除了优于以前的最先进模型外,我们还在 CPath 中展示了新的建模能力,例如与分辨率无关的组织分类、使用少量样本类原型的幻灯片分类以及在多达 108 种癌症类型的 OncoTree 分类系统中进行疾病亚型分类的泛化能力。UNI 在 CPath 中的预训练数据和下游评估方面都推进了无监督表示学习的规模化,使能够进行数据高效的人工智能模型,这些模型可以泛化并转移到广泛的具有诊断挑战性的任务和临床工作流程中。
Comput Methods Programs Biomed. 2024-8
BMC Med Inform Decis Mak. 2024-10-7
Front Pharmacol. 2025-8-20
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2025-6
Nat Mach Intell. 2025
Nat Biotechnol. 2024-9-30
Patterns (N Y). 2023-8-4
Med Image Anal. 2023-10
Nat Biomed Eng. 2023-6
Genes Chromosomes Cancer. 2023-9