Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
Nat Biomed Eng. 2022 Dec;6(12):1420-1434. doi: 10.1038/s41551-022-00929-8. Epub 2022 Oct 10.
The adoption of digital pathology has enabled the curation of large repositories of gigapixel whole-slide images (WSIs). Computationally identifying WSIs with similar morphologic features within large repositories without requiring supervised training can have significant applications. However, the retrieval speeds of algorithms for searching similar WSIs often scale with the repository size, which limits their clinical and research potential. Here we show that self-supervised deep learning can be leveraged to search for and retrieve WSIs at speeds that are independent of repository size. The algorithm, which we named SISH (for self-supervised image search for histology) and provide as an open-source package, requires only slide-level annotations for training, encodes WSIs into meaningful discrete latent representations and leverages a tree data structure for fast searching followed by an uncertainty-based ranking algorithm for WSI retrieval. We evaluated SISH on multiple tasks (including retrieval tasks based on tissue-patch queries) and on datasets spanning over 22,000 patient cases and 56 disease subtypes. SISH can also be used to aid the diagnosis of rare cancer types for which the number of available WSIs is often insufficient to train supervised deep-learning models.
数字病理学的采用使得能够对千兆像素全切片图像 (WSI) 的大型存储库进行管理。在不需要监督训练的情况下,在大型存储库中计算识别具有相似形态特征的 WSI 具有重要的应用。然而,用于搜索相似 WSI 的算法的检索速度通常与存储库的大小成正比,这限制了它们的临床和研究潜力。在这里,我们展示了可以利用自监督深度学习来搜索和检索 WSI,其速度与存储库的大小无关。我们将该算法命名为 SISH(用于组织学的自监督图像搜索),并提供了一个开源软件包,该算法仅需要幻灯片级别的注释进行训练,将 WSI 编码为有意义的离散潜在表示,并利用树数据结构进行快速搜索,然后使用基于不确定性的排名算法进行 WSI 检索。我们在多个任务(包括基于组织斑块查询的检索任务)和涵盖超过 22000 个病例和 56 种疾病亚型的数据集上评估了 SISH。SISH 还可用于辅助诊断罕见癌症类型,这些癌症类型的 WSI 数量通常不足以训练监督深度学习模型。