Computational Biology Program, Ontario Institute for Cancer Research, Toronto, ON, Canada.
Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
Nat Commun. 2020 Feb 5;11(1):728. doi: 10.1038/s41467-019-13825-8.
In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA.
在癌症中,原发肿瘤的器官起源和组织病理学是其临床行为的最强决定因素,但在 3%的病例中,患者出现转移性肿瘤而无明显的原发性肿瘤。在这里,作为 ICGC/TCGA 全基因组泛癌分析(PCAWG)联盟的一部分,我们训练了一个深度学习分类器,根据 2606 个肿瘤的全基因组测序(WGS)中检测到的体细胞过客突变模式来预测癌症类型,这些肿瘤代表了 PCAWG 联盟产生的 24 种常见癌症类型。我们的分类器在保留的肿瘤样本上的准确率为 91%,在独立的原发性和转移性样本上的准确率分别为 88%和 83%,大致是接受转移性肿瘤而不知道原发性肿瘤的训练有素的病理学家的准确率的两倍。令人惊讶的是,增加关于驱动突变的信息会降低准确性。我们的结果具有临床适用性,强调了体细胞过客突变模式如何编码起源细胞的状态,并为未来检测循环肿瘤 DNA 来源的策略提供了信息。