Winkelmaier Garrett, Koch Brandon, Bogardus Skylar, Borowsky Alexander D, Parvin Bahram
Department of Electrical and Biomedical Engineering, College of Engineering, University of Nevada Reno, 1664 N. Virginia St., Reno, NV 89509, USA.
Department of Biostatics, College of Public Health, Ohio State University, 281 W. Lane Ave., Columbus, OH 43210, USA.
Cancers (Basel). 2023 Apr 20;15(8):2387. doi: 10.3390/cancers15082387.
Tumor Whole Slide Images (WSI) are often heterogeneous, which hinders the discovery of biomarkers in the presence of confounding clinical factors. In this study, we present a pipeline for identifying biomarkers from the Glioblastoma Multiforme (GBM) cohort of WSIs from TCGA archive. The GBM cohort endures many technical artifacts while the discovery of GBM biomarkers is challenged because "age" is the single most confounding factor for predicting outcomes. The proposed approach relies on interpretable features (e.g., nuclear morphometric indices), effective similarity metrics for heterogeneity analysis, and robust statistics for identifying biomarkers. The pipeline first removes artifacts (e.g., pen marks) and partitions each WSI into patches for nuclear segmentation via an extended U-Net for subsequent quantitative representation. Given the variations in fixation and staining that can artificially modulate hematoxylin optical density (HOD), we extended Navab's Lab method to normalize images and reduce the impact of batch effects. The heterogeneity of each WSI is then represented either as probability density functions (PDF) per patient or as the composition of a dictionary predicted from the entire cohort of WSIs. For PDF- or dictionary-based methods, morphometric subtypes are constructed based on distances computed from optimal transport and linkage analysis or consensus clustering with Euclidean distances, respectively. For each inferred subtype, Kaplan-Meier and/or the Cox regression model are used to regress the survival time. Since age is the single most important confounder for predicting survival in GBM and there is an observed violation of the proportionality assumption in the Cox model, we use both age and age-squared coupled with the Likelihood ratio test and forest plots for evaluating competing statistics. Next, the PDF- and dictionary-based methods are combined to identify biomarkers that are predictive of survival. The combined model has the advantage of integrating global (e.g., cohort scale) and local (e.g., patient scale) attributes of morphometric heterogeneity, coupled with robust statistics, to reveal stable biomarkers. The results indicate that, after normalization of the GBM cohort, mean HOD, eccentricity, and cellularity are predictive of survival. Finally, we also stratified the GBM cohort as a function of EGFR expression and published genomic subtypes to reveal genomic-dependent morphometric biomarkers.
肿瘤全切片图像(WSI)通常具有异质性,这在存在混杂临床因素的情况下阻碍了生物标志物的发现。在本研究中,我们提出了一种从TCGA存档的胶质母细胞瘤(GBM)队列的WSI中识别生物标志物的流程。GBM队列存在许多技术伪像,而GBM生物标志物的发现面临挑战,因为“年龄”是预测结果的最主要混杂因素。所提出的方法依赖于可解释的特征(如核形态计量指标)、用于异质性分析的有效相似性度量以及用于识别生物标志物的稳健统计方法。该流程首先去除伪像(如笔触标记),并通过扩展的U-Net将每个WSI划分为用于核分割的图像块,以便进行后续的定量表征。鉴于固定和染色的差异会人为调节苏木精光密度(HOD),我们扩展了纳瓦布的实验室方法来对图像进行归一化,并减少批次效应的影响。然后,每个WSI的异质性要么表示为每个患者的概率密度函数(PDF),要么表示为从整个WSI队列预测的字典的组成。对于基于PDF或字典的方法,形态计量亚型分别基于从最优传输和连锁分析计算的距离或使用欧几里得距离的一致性聚类来构建。对于每个推断出的亚型,使用Kaplan-Meier和/或Cox回归模型对生存时间进行回归分析。由于年龄是预测GBM生存的最重要混杂因素,并且在Cox模型中观察到比例假设不成立,我们使用年龄和年龄平方,并结合似然比检验和森林图来评估竞争统计量。接下来,将基于PDF和字典的方法结合起来识别预测生存的生物标志物。组合模型具有整合形态计量异质性的全局(如队列规模)和局部(如患者规模)属性以及稳健统计量的优势,以揭示稳定的生物标志物。结果表明,在对GBM队列进行归一化后,平均HOD、偏心率和细胞密度可预测生存。最后,我们还根据表皮生长因子受体(EGFR)表达和已发表的基因组亚型对GBM队列进行分层,以揭示基因组依赖性形态计量生物标志物。