Oner Mustafa Umit, Chen Jianbin, Revkov Egor, James Anne, Heng Seow Ye, Kaya Arife Neslihan, Alvarez Jacob Josiah Santiago, Takano Angela, Cheng Xin Min, Lim Tony Kiat Hon, Tan Daniel Shao Weng, Zhai Weiwei, Skanderup Anders Jacobsen, Sung Wing-Kin, Lee Hwee Kuan
Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore 138671, Singapore.
School of Computing, National University of Singapore, Singapore 117417, Singapore.
Patterns (N Y). 2021 Dec 9;3(2):100399. doi: 10.1016/j.patter.2021.100399. eCollection 2022 Feb 11.
Tumor purity is the percentage of cancer cells within a tissue section. Pathologists estimate tumor purity to select samples for genomic analysis by manually reading hematoxylin-eosin (H&E)-stained slides, which is tedious, time consuming, and prone to inter-observer variability. Besides, pathologists' estimates do not correlate well with genomic tumor purity values, which are inferred from genomic data and accepted as accurate for downstream analysis. We developed a deep multiple instance learning model predicting tumor purity from H&E-stained digital histopathology slides. Our model successfully predicted tumor purity in eight The Cancer Genome Atlas (TCGA) cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values. Thus, our model can be utilized to select samples for genomic analysis, which will help reduce pathologists' workload and decrease inter-observer variability. Furthermore, our model provided tumor purity maps showing the spatial variation within sections. They can help better understand the tumor microenvironment.
肿瘤纯度是指组织切片中癌细胞的百分比。病理学家通过人工阅读苏木精-伊红(H&E)染色切片来估计肿瘤纯度,以选择用于基因组分析的样本,这一过程既繁琐又耗时,而且容易出现观察者间的差异。此外,病理学家的估计与从基因组数据推断出的、被认为对下游分析准确的基因组肿瘤纯度值相关性不佳。我们开发了一种深度多实例学习模型,可从H&E染色的数字组织病理学切片预测肿瘤纯度。我们的模型成功地在八个癌症基因组图谱(TCGA)队列和一个新加坡本地队列中预测了肿瘤纯度。这些预测与基因组肿瘤纯度值高度一致。因此,我们的模型可用于选择用于基因组分析的样本,这将有助于减轻病理学家的工作量并减少观察者间的差异。此外,我们的模型提供了显示切片内空间变化的肿瘤纯度图。它们有助于更好地理解肿瘤微环境。