The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA.
The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA.
EBioMedicine. 2020 Nov;61:103030. doi: 10.1016/j.ebiom.2020.103030. Epub 2020 Oct 9.
Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients.
We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively.
The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform.
NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.
不明原发灶癌(CUP)约占所有恶性肿瘤的 3-5%,定义为转移性癌症,尽管进行了标准诊断检查,但仍无法找到原发灶。由于了解患者的原发癌对其治疗至关重要,因此 CUP 患者处于明显劣势,大多数患者的生存预后较差。因此,开发用于确定癌症组织起源的强大且易于使用的诊断方法对于 CUP 患者具有重要价值。
我们开发了一种名为 CUP-AI-Dx 的基于 RNA 的分类器,该分类器利用一维 Inception 卷积神经网络(1D-Inception)模型来推断肿瘤的原发组织起源。CUP-AI-Dx 使用来自癌症基因组图谱项目(TCGA)和国际癌症基因组联合会(ICGC)的 18217 个原发性肿瘤的转录谱进行训练,这些肿瘤代表 32 种癌症类型。基因表达数据按基因染色体坐标排序作为 1D-CNN 模型的输入,该模型同时利用多个具有不同配置的卷积核以提高通用性。通过广泛的超参数调整对模型进行优化,包括不同的最大池化层和辍学设置。对于 11 种肿瘤类型,我们还开发了一种随机森林模型,可以根据先前的 TCGA 研究对肿瘤的分子亚型进行分类。优化后的 CUP-AI-Dx 组织起源分类器在 TCGA 的 11 种肿瘤类型的 394 个转移性样本和来自两个临床实验室的 18 种癌症类型的 92 个福尔马林固定石蜡包埋(FFPE)样本上进行了测试。CUP-AI-Dx 分子亚型也在独立的卵巢和乳腺癌微阵列数据集 FINDINGS 上进行了独立测试:CUP-AI-Dx 在交叉验证中总体准确率为 98.54%,在测试数据集上的准确率为 96.70%。当应用于来自美国和澳大利亚的两个不同机构的两个独立的临床级 RNA-seq 数据集时,我们的模型预测原发部位的准确率分别为 86.96%和 72.46%。
CUP-AI-Dx 可以高精度地预测肿瘤的原发部位和分子亚型,因此可以使用通用且易于使用的基因组学平台辅助诊断不明原发灶癌或来源不明的癌症。
美国国立卫生研究院 R35 GM133562、美国国家癌症研究所 P30 CA034196、澳大利亚维多利亚癌症协会。