Tavolara Thomas E, Gurcan Metin N, Niazi M Khalid Khan
Center for Biomedical Informatics, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA.
Cancers (Basel). 2022 Nov 24;14(23):5778. doi: 10.3390/cancers14235778.
Recent methods in computational pathology have trended towards semi- and weakly-supervised methods requiring only slide-level labels. Yet, even slide-level labels may be absent or irrelevant to the application of interest, such as in clinical trials. Hence, we present a fully unsupervised method to learn meaningful, compact representations of WSIs. Our method initially trains a tile-wise encoder using SimCLR, from which subsets of tile-wise embeddings are extracted and fused via an attention-based multiple-instance learning framework to yield slide-level representations. The resulting set of intra-slide-level and inter-slide-level embeddings are attracted and repelled via contrastive loss, respectively. This resulted in slide-level representations with self-supervision. We applied our method to two tasks- (1) non-small cell lung cancer subtyping (NSCLC) as a classification prototype and (2) breast cancer proliferation scoring (TUPAC16) as a regression prototype-and achieved an AUC of 0.8641 ± 0.0115 and correlation (R) of 0.5740 ± 0.0970, respectively. Ablation experiments demonstrate that the resulting unsupervised slide-level feature space can be fine-tuned with small datasets for both tasks. Overall, our method approaches computational pathology in a novel manner, where meaningful features can be learned from whole-slide images without the need for annotations of slide-level labels. The proposed method stands to benefit computational pathology, as it theoretically enables researchers to benefit from completely unlabeled whole-slide images.
计算病理学的最新方法已趋向于采用仅需玻片级标签的半监督和弱监督方法。然而,即使是玻片级标签也可能缺失或与感兴趣的应用无关,例如在临床试验中。因此,我们提出了一种完全无监督的方法来学习有意义的、紧凑的全切片图像表示。我们的方法首先使用SimCLR训练一个逐块编码器,从中提取逐块嵌入的子集,并通过基于注意力的多实例学习框架进行融合,以生成玻片级表示。通过对比损失分别对所得的玻片内级和玻片间级嵌入集进行吸引和排斥。这产生了具有自监督的玻片级表示。我们将我们的方法应用于两项任务——(1)将非小细胞肺癌亚型分类(NSCLC)作为分类原型,(2)将乳腺癌增殖评分(TUPAC16)作为回归原型——分别实现了0.8641±0.0115的曲线下面积(AUC)和0.5740±0.0970的相关性(R)。消融实验表明,所得的无监督玻片级特征空间可以用小数据集对这两项任务进行微调。总体而言,我们的方法以一种新颖的方式处理计算病理学,即可以从全切片图像中学习有意义的特征,而无需玻片级标签的注释。所提出的方法有望使计算病理学受益,因为从理论上讲,它使研究人员能够从完全未标记的全切片图像中受益。