Mukashyaka Patience, Sheridan Todd B, Foroughi Pour Ali, Chuang Jeffrey H
The Jackson Laboratory for Genomic Medicine, Farmington, CT.
University of Connecticut Health Center, Department of Genetics and Genome Sciences, Farmington, CT.
bioRxiv. 2023 Aug 3:2023.08.01.551468. doi: 10.1101/2023.08.01.551468.
Deep learning has revolutionized digital pathology, allowing for automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for diverse tasks. In such analyses, WSIs are typically broken into smaller images called tiles, and a neural network backbone encodes each tile in a feature space. Many recent works have applied attention based deep learning models to aggregate tile-level features into a slide-level representation, which is then used for slide-level prediction tasks. However, training attention models is computationally intensive, necessitating hyperparameter optimization and specialized training procedures. Here, we propose SAMPLER, a fully statistical approach to generate efficient and informative WSI representations by encoding the empirical cumulative distribution functions (CDFs) of multiscale tile features. We demonstrate that SAMPLER-based classifiers are as accurate or better than state-of-the-art fully deep learning attention models for classification tasks including distinction of: subtypes of breast carcinoma (BRCA: AUC=0.911 ± 0.029); subtypes of non-small cell lung carcinoma (NSCLC: AUC=0.940±0.018); and subtypes of renal cell carcinoma (RCC: AUC=0.987±0.006). A major advantage of the SAMPLER representation is that predictive models are >100X faster compared to attention models. Histopathological review confirms that SAMPLER-identified high attention tiles contain tumor morphological features specific to the tumor type, while low attention tiles contain fibrous stroma, blood, or tissue folding artifacts. We further apply SAMPLER concepts to improve the design of attention-based neural networks, yielding a context aware multi-head attention model with increased accuracy for subtype classification within BRCA and RCC (BRCA: AUC=0.921±0.027, and RCC: AUC=0.988±0.010). Finally, we provide theoretical results identifying sufficient conditions for which SAMPLER is optimal. SAMPLER is a fast and effective approach for analyzing WSIs, with greatly improved scalability over attention methods to benefit digital pathology analysis.
深度学习彻底改变了数字病理学,使得对苏木精和伊红(H&E)染色的全切片图像(WSIs)进行自动分析以完成各种任务成为可能。在这类分析中,WSIs通常被分割成称为图块的较小图像,并且神经网络主干在特征空间中对每个图块进行编码。最近的许多工作都应用了基于注意力的深度学习模型,将图块级特征聚合为幻灯片级表示,然后将其用于幻灯片级预测任务。然而,训练注意力模型计算量很大,需要进行超参数优化和专门的训练过程。在这里,我们提出了SAMPLER,这是一种完全统计的方法,通过对多尺度图块特征的经验累积分布函数(CDF)进行编码来生成高效且信息丰富的WSI表示。我们证明,基于SAMPLER的分类器在包括区分以下类型的分类任务中与最先进的全深度学习注意力模型一样准确或更准确:乳腺癌(BRCA:AUC = 0.911±0.029)的亚型;非小细胞肺癌(NSCLC:AUC = 0.940±0.018)的亚型;以及肾细胞癌(RCC:AUC = 0.987±0.006)的亚型。SAMPLER表示的一个主要优点是,与注意力模型相比,预测模型的速度快100倍以上。组织病理学检查证实,SAMPLER识别出的高注意力图块包含特定肿瘤类型的肿瘤形态特征,而低注意力图块包含纤维性基质、血液或组织折叠伪像。我们进一步应用SAMPLER概念来改进基于注意力的神经网络的设计,产生了一种上下文感知多头注意力模型,在BRCA和RCC内的亚型分类中准确性有所提高(BRCA:AUC = 0.921±0.027,RCC:AUC = 0.988±0.010)。最后,我们提供了理论结果,确定了SAMPLER为最优的充分条件。SAMPLER是一种快速有效的WSI分析方法,与注意力方法相比,其可扩展性有了极大提高,有利于数字病理学分析。