Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, Sweden.
Faculty of Medicine and Health Technology, Tampere University, 33100 Tampere, Finland.
Bioinformatics. 2022 Jun 27;38(13):3462-3469. doi: 10.1093/bioinformatics/btac343.
Molecular phenotyping by gene expression profiling is central in contemporary cancer research and in molecular diagnostics but remains resource intense to implement. Changes in gene expression occurring in tumours cause morphological changes in tissue, which can be observed on the microscopic level. The relationship between morphological patterns and some of the molecular phenotypes can be exploited to predict molecular phenotypes from routine haematoxylin and eosin-stained whole slide images (WSIs) using convolutional neural networks (CNNs). In this study, we propose a new, computationally efficient approach to model relationships between morphology and gene expression.
We conducted the first transcriptome-wide analysis in prostate cancer, using CNNs to predict bulk RNA-sequencing estimates from WSIs for 370 patients from the TCGA PRAD study. Out of 15 586 protein coding transcripts, 6618 had predicted expression significantly associated with RNA-seq estimates (FDR-adjusted P-value <1×10-4) in a cross-validation and 5419 (81.9%) of these associations were subsequently validated in a held-out test set. We furthermore predicted the prognostic cell-cycle progression score directly from WSIs. These findings suggest that contemporary computer vision models offer an inexpensive and scalable solution for prediction of gene expression phenotypes directly from WSIs, providing opportunity for cost-effective large-scale research studies and molecular diagnostics.
A self-contained example is available from http://github.com/phiwei/prostate_coexpression. Model predictions and metrics are available from doi.org/10.5281/zenodo.4739097.
Supplementary data are available at Bioinformatics online.
通过基因表达谱进行分子表型分析在当代癌症研究和分子诊断中至关重要,但实施起来仍然需要大量资源。肿瘤中发生的基因表达变化会导致组织的形态变化,这些变化可以在显微镜下观察到。形态模式与某些分子表型之间的关系可以被利用,使用卷积神经网络(CNN)从常规苏木精和伊红染色全切片图像(WSI)预测分子表型。在这项研究中,我们提出了一种新的、计算效率高的方法来模拟形态和基因表达之间的关系。
我们首次在前列腺癌中进行了全转录组分析,使用 CNN 从 TCGA PRAD 研究的 370 名患者的 WSI 预测批量 RNA-seq 估计。在 15586 个蛋白质编码转录本中,有 6618 个具有与 RNA-seq 估计显著相关的预测表达(经 FDR 调整的 P 值 <1×10-4),在交叉验证中,其中 5419 个(81.9%)关联随后在独立测试集中得到验证。我们还直接从 WSI 预测了预后细胞周期进展评分。这些发现表明,现代计算机视觉模型为直接从 WSI 预测基因表达表型提供了一种廉价且可扩展的解决方案,为具有成本效益的大规模研究和分子诊断提供了机会。
一个独立的示例可从 http://github.com/phiwei/prostate_coexpression 获得。模型预测和指标可从 doi.org/10.5281/zenodo.4739097 获得。
补充数据可在 Bioinformatics 在线获得。