School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China.
School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China.
Biochim Biophys Acta Mol Basis Dis. 2018 Jun;1864(6 Pt B):2218-2227. doi: 10.1016/j.bbadis.2017.12.026. Epub 2017 Dec 19.
Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.
癌症被认为是存在于许多组织和器官中的肿瘤细胞的恶性增殖,它会严重降低人类的生活质量。利用血浆 DNA 进行癌症检测的潜力已得到广泛认可,这就需要通过鉴定体细胞突变来绘制组织起源。利用下一代测序等尖端技术,已经鉴定出许多体细胞突变,并揭示了不同癌症类型的突变特征。然而,体细胞突变在致癌过程中不是独立的事件,而是发挥功能作用。在这项研究中,我们对五种癌症类型进行了泛癌症分析:(I)乳腺癌(BRCA)、(II)结直肠癌(COADREAD)、(III)头颈部鳞状细胞癌(HNSC)、(IV)肾透明细胞癌(KIRC)和(V)卵巢癌(OV)。根据患有上述癌症类型之一的患者的突变基因,我们将他们的基因编码成大量数值,基于基因本体论(GO)术语和京都基因与基因组百科全书(KEGG)途径的富集理论。我们使用蒙特卡罗特征选择(MCFS)方法分析了这些特征,然后使用增量特征选择(IFS)方法来识别功能改变特征,这些特征可用于构建基于支持向量机(SVM)的分类器来区分五种癌症类型。我们的结果表明,使用所选 344 个特征的最优分类器具有最高的马修斯相关系数值 0.523。MCFS 方法产生的 16 条决策规则可以为五种癌症类型的分类提供 0.498 的整体准确率。进一步的分析表明,其中一些特征和规则得到了以前实验的支持。这项研究不仅提出了一种新的方法来绘制癌症检测的组织起源图谱,还揭示了每种癌症类型的特定功能改变,为癌症特异性功能异常作为潜在治疗靶点提供了深入了解。本文是由 Yudong Cai 和 Tao Huang 编辑的题为“通过遗传和基因组大数据分析加速精准医学”的特刊的一部分。