Mondol Raktim Kumar, Millar Ewan K A, Graham Peter H, Browne Lois, Sowmya Arcot, Meijering Erik
School of Computer Science and Engineering, UNSW Sydney, Kensington, NSW 2052, Australia.
Department of Anatomical Pathology, NSW Health Pathology, St. George Hospital, Kogarah, NSW 2217, Australia.
Cancers (Basel). 2023 Apr 30;15(9):2569. doi: 10.3390/cancers15092569.
Gene expression can be used to subtype breast cancer with improved prediction of risk of recurrence and treatment responsiveness over that obtained using routine immunohistochemistry (IHC). However, in the clinic, molecular profiling is primarily used for ER+ breast cancer, which is costly, tissue destructive, requires specialised platforms, and takes several weeks to obtain a result. Deep learning algorithms can effectively extract morphological patterns in digital histopathology images to predict molecular phenotypes quickly and cost-effectively. We propose a new, computationally efficient approach called hist2RNA inspired by bulk RNA sequencing techniques to predict the expression of 138 genes (incorporated from 6 commercially available molecular profiling tests), including luminal PAM50 subtype, from hematoxylin and eosin (H&E)-stained whole slide images (WSIs). The training phase involves the aggregation of extracted features for each patient from a pretrained model to predict gene expression at the patient level using annotated H&E images from The Cancer Genome Atlas (TCGA, n = 335). We demonstrate successful gene prediction on a held-out test set (n = 160, corr = 0.82 across patients, corr = 0.29 across genes) and perform exploratory analysis on an external tissue microarray (TMA) dataset (n = 498) with known IHC and survival information. Our model is able to predict gene expression and luminal PAM50 subtype (Luminal A versus Luminal B) on the TMA dataset with prognostic significance for overall survival in univariate analysis (c-index = 0.56, hazard ratio = 2.16 (95% CI 1.12-3.06), < 5 × 10), and independent significance in multivariate analysis incorporating standard clinicopathological variables (c-index = 0.65, hazard ratio = 1.87 (95% CI 1.30-2.68), < 5 × 10). The proposed strategy achieves superior performance while requiring less training time, resulting in less energy consumption and computational cost compared to patch-based models. Additionally, hist2RNA predicts gene expression that has potential to determine luminal molecular subtypes which correlates with overall survival, without the need for expensive molecular testing.
基因表达可用于对乳腺癌进行亚型分类,与使用常规免疫组织化学(IHC)相比,能更好地预测复发风险和治疗反应性。然而,在临床上,分子谱分析主要用于雌激素受体阳性(ER+)乳腺癌,这种方法成本高、具有组织破坏性、需要专门的平台,且需要数周时间才能得到结果。深度学习算法可以有效地从数字组织病理学图像中提取形态学模式,从而快速且经济高效地预测分子表型。我们提出了一种名为hist2RNA的新的计算高效方法,该方法受批量RNA测序技术启发,旨在从苏木精和伊红(H&E)染色的全切片图像(WSIs)中预测138个基因(整合自6种市售分子谱分析测试)的表达,包括腔上皮PAM50亚型。训练阶段包括从预训练模型中汇总每个患者提取的特征,以使用来自癌症基因组图谱(TCGA,n = 335)的带注释H&E图像在患者水平上预测基因表达。我们在一个保留测试集(n = 160,患者间相关性 = 0.82,基因间相关性 = 0.29)上证明了基因预测的成功,并对具有已知IHC和生存信息的外部组织微阵列(TMA)数据集(n = 498)进行了探索性分析。我们的模型能够在TMA数据集上预测基因表达和腔上皮PAM50亚型(Luminal A与Luminal B),在单变量分析中对总生存具有预后意义(c指数 = 0.56,风险比 = 2.16(95% CI 1.12 - 3.06),P < 5×10⁻²),并且在纳入标准临床病理变量的多变量分析中具有独立意义(c指数 = 0.65,风险比 = 1.87(95% CI 1.30 - 2.68),P < 5×10⁻²)。与基于补丁的模型相比,所提出的策略在需要更少训练时间的同时实现了卓越的性能,从而降低了能源消耗和计算成本。此外,hist2RNA预测的基因表达有可能确定与总生存相关的腔上皮分子亚型,而无需进行昂贵的分子检测。