开发和验证一种弱监督深度学习框架,以从常规组织学图像预测结直肠癌中分子通路和关键突变的状态:一项回顾性研究。
Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study.
机构信息
Tissue Image Analytics Centre, Department of Computer Science, University of Warwick, Coventry, UK.
Tissue Image Analytics Centre, Department of Computer Science, University of Warwick, Coventry, UK; Department of Pathology, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK.
出版信息
Lancet Digit Health. 2021 Dec;3(12):e763-e772. doi: 10.1016/S2589-7500(21)00180-1. Epub 2021 Oct 19.
BACKGROUND
Determining the status of molecular pathways and key mutations in colorectal cancer is crucial for optimal therapeutic decision making. We therefore aimed to develop a novel deep learning pipeline to predict the status of key molecular pathways and mutations from whole-slide images of haematoxylin and eosin-stained colorectal cancer slides as an alternative to current tests.
METHODS
In this retrospective study, we used 502 diagnostic slides of primary colorectal tumours from 499 patients in The Cancer Genome Atlas colon and rectal cancer (TCGA-CRC-DX) cohort and developed a weakly supervised deep learning framework involving three separate convolutional neural network models. Whole-slide images were divided into equally sized tiles and model 1 (ResNet18) extracted tumour tiles from non-tumour tiles. These tumour tiles were inputted into model 2 (adapted ResNet34), trained by iterative draw and rank sampling to calculate a prediction score for each tile that represented the likelihood of a tile belonging to the molecular labels of high mutation density (vs low mutation density), microsatellite instability (vs microsatellite stability), chromosomal instability (vs genomic stability), CpG island methylator phenotype (CIMP)-high (vs CIMP-low), BRAF (vs BRAF), TP53 (vs TP53), and KRAS (vs KRAS). These scores were used to identify the top-ranked titles from each slide, and model 3 (HoVer-Net) segmented and classified the different types of cell nuclei in these tiles. We calculated the area under the convex hull of the receiver operating characteristic curve (AUROC) as a model performance measure and compared our results with those of previously published methods.
FINDINGS
Our iterative draw and rank sampling method yielded mean AUROCs for the prediction of hypermutation (0·81 [SD 0·03] vs 0·71), microsatellite instability (0·86 [0·04] vs 0·74), chromosomal instability (0·83 [0·02] vs 0·73), BRAF (0·79 [0·01] vs 0·66), and TP53 (0·73 [0·02] vs 0·64) in the TCGA-CRC-DX cohort that were higher than those from previously published methods, and an AUROC for KRAS that was similar to previously reported methods (0·60 [SD 0·04] vs 0·60). Mean AUROC for predicting CIMP-high status was 0·79 (SD 0·05). We found high proportions of tumour-infiltrating lymphocytes and necrotic tumour cells to be associated with microsatellite instability, and high proportions of tumour-infiltrating lymphocytes and a low proportion of necrotic tumour cells to be associated with hypermutation.
INTERPRETATION
After large-scale validation, our proposed algorithm for predicting clinically important mutations and molecular pathways, such as microsatellite instability, in colorectal cancer could be used to stratify patients for targeted therapies with potentially lower costs and quicker turnaround times than sequencing-based or immunohistochemistry-based approaches.
FUNDING
The UK Medical Research Council.
背景
确定结直肠癌中分子途径和关键突变的状态对于最佳治疗决策至关重要。因此,我们旨在开发一种新的深度学习管道,从苏木精和伊红染色的结直肠癌幻灯片的全幻灯片图像中预测关键分子途径和突变的状态,以替代当前的测试。
方法
在这项回顾性研究中,我们使用了来自 TCGA 结肠和直肠癌(TCGA-CRC-DX)队列的 499 名患者的 502 张原发性结直肠肿瘤诊断幻灯片,并开发了一个涉及三个独立卷积神经网络模型的弱监督深度学习框架。全幻灯片图像被分成大小相等的瓦片,模型 1(ResNet18)从非肿瘤瓦片中提取肿瘤瓦片。这些肿瘤瓦片被输入到模型 2(改编的 ResNet34)中,通过迭代绘制和排名采样进行训练,以计算每个瓦片的预测分数,该分数表示瓦片属于高突变密度(与低突变密度)、微卫星不稳定(与微卫星稳定)、染色体不稳定性(与基因组稳定性)、CpG 岛甲基化表型(CIMP-高)(与 CIMP-低)、BRAF(与 BRAF)、TP53(与 TP53)和 KRAS(与 KRAS)分子标签的可能性。这些分数用于从每张幻灯片中识别排名最高的标题,模型 3(HoVer-Net)对这些瓦片中的不同类型的细胞核进行分割和分类。我们计算了接收者操作特征曲线的凸包面积(AUROC)作为模型性能指标,并将我们的结果与以前发表的方法进行了比较。
发现
我们的迭代绘制和排名采样方法在 TCGA-CRC-DX 队列中预测超突变(0.81 [0.03] vs 0.71)、微卫星不稳定(0.86 [0.04] vs 0.74)、染色体不稳定性(0.83 [0.02] vs 0.73)、BRAF(0.79 [0.01] vs 0.66)和 TP53(0.73 [0.02] vs 0.64)的平均 AUROC 高于以前发表的方法,KRAS 的 AUROC 与以前报道的方法相似(0.60 [0.04] vs 0.60)。预测 CIMP-高状态的平均 AUROC 为 0.79(0.05)。我们发现肿瘤浸润淋巴细胞和坏死肿瘤细胞的高比例与微卫星不稳定有关,而肿瘤浸润淋巴细胞的高比例和坏死肿瘤细胞的低比例与超突变有关。
解释
在大规模验证后,我们提出的用于预测结直肠癌中临床重要突变和分子途径(如微卫星不稳定)的算法可以用于对患者进行分层,以便进行靶向治疗,与基于测序或免疫组织化学的方法相比,具有潜在的更低成本和更快的周转时间。
资金
英国医学研究理事会。