Institute of Biomedical Engineering (IBME), Department of Engineering Science, University of Oxford, Oxford, UK.
Big Data Institute, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Oxford, UK.
Gut. 2021 Mar;70(3):544-554. doi: 10.1136/gutjnl-2019-319866. Epub 2020 Jul 20.
Complex phenotypes captured on histological slides represent the biological processes at play in individual cancers, but the link to underlying molecular classification has not been clarified or systematised. In colorectal cancer (CRC), histological grading is a poor predictor of disease progression, and consensus molecular subtypes (CMSs) cannot be distinguished without gene expression profiling. We hypothesise that image analysis is a cost-effective tool to associate complex features of tissue organisation with molecular and outcome data and to resolve unclassifiable or heterogeneous cases. In this study, we present an image-based approach to predict CRC CMS from standard H&E sections using deep learning.
Training and evaluation of a neural network were performed using a total of n=1206 tissue sections with comprehensive multi-omic data from three independent datasets (training on FOCUS trial, n=278 patients; test on rectal cancer biopsies, GRAMPIAN cohort, n=144 patients; and The Cancer Genome Atlas (TCGA), n=430 patients). Ground truth CMS calls were ascertained by matching random forest and single sample predictions from CMS classifier.
Image-based CMS (imCMS) accurately classified slides in unseen datasets from TCGA (n=431 slides, AUC)=0.84) and rectal cancer biopsies (n=265 slides, AUC=0.85). imCMS spatially resolved intratumoural heterogeneity and provided secondary calls correlating with bioinformatic prediction from molecular data. imCMS classified samples previously unclassifiable by RNA expression profiling, reproduced the expected correlations with genomic and epigenetic alterations and showed similar prognostic associations as transcriptomic CMS.
This study shows that a prediction of RNA expression classifiers can be made from H&E images, opening the door to simple, cheap and reliable biological stratification within routine workflows.
组织学切片上捕获的复杂表型代表了个体癌症中发生的生物学过程,但与潜在分子分类的联系尚未阐明或系统化。在结直肠癌(CRC)中,组织学分级是疾病进展的不良预测因子,并且在没有基因表达谱分析的情况下无法区分共识分子亚型(CMS)。我们假设图像分析是一种具有成本效益的工具,可以将组织学结构的复杂特征与分子和结果数据相关联,并解决无法分类或异质性的病例。在这项研究中,我们提出了一种基于图像的方法,使用深度学习从标准 H&E 切片预测 CRC CMS。
使用来自三个独立数据集的 n=1206 个具有全面多组学数据的组织切片(FOCUS 试验训练,n=278 例患者;直肠活检 GRAMPIAN 队列,n=144 例患者;TCGA,n=430 例患者)对神经网络进行训练和评估。通过匹配随机森林和 CMS 分类器的单个样本预测,确定 CMS 调用的真实值。
基于图像的 CMS(imCMS)准确地对 TCGA 中未见过的数据集中的切片进行分类(n=431 个切片,AUC=0.84)和直肠活检(n=265 个切片,AUC=0.85)。imCMS 空间分辨率肿瘤内异质性,并提供与分子数据的生物信息学预测相关的次要调用。imCMS 对 RNA 表达谱分析不可分类的样本进行分类,再现了与基因组和表观遗传改变的预期相关性,并显示出与转录组 CMS 相似的预后相关性。
这项研究表明,可以从 H&E 图像中做出 RNA 表达分类器的预测,为常规工作流程中的简单、廉价和可靠的生物学分层开辟了道路。