Mercatelli Daniele, Ray Forest, Giorgi Federico M
Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
Department of Systems Biology, Columbia University Medical Center, New York, NY, United States.
Front Genet. 2019 Jul 18;10:671. doi: 10.3389/fgene.2019.00671. eCollection 2019.
Cancer is a disease often characterized by the presence of multiple genomic alterations, which trigger altered transcriptional patterns and gene expression, which in turn sustain the processes of tumorigenesis, tumor progression, and tumor maintenance. The links between genomic alterations and gene expression profiles can be utilized as the basis to build specific molecular tumorigenic relationships. In this study, we perform pan-cancer predictions of the presence of single somatic mutations and copy number variations using machine learning approaches on gene expression profiles. We show that gene expression can be used to predict genomic alterations in every tumor type, where some alterations are more predictable than others. We propose gene aggregation as a tool to improve the accuracy of alteration prediction models from gene expression profiles. Ultimately, we show how this principle can be beneficial in intrinsically noisy datasets, such as those based on single-cell sequencing.
癌症是一种通常以存在多种基因组改变为特征的疾病,这些改变会引发转录模式和基因表达的改变,进而维持肿瘤发生、肿瘤进展和肿瘤维持的过程。基因组改变与基因表达谱之间的联系可作为建立特定分子致瘤关系的基础。在本研究中,我们使用机器学习方法对基因表达谱进行单体细胞突变和拷贝数变异存在情况的泛癌预测。我们表明,基因表达可用于预测每种肿瘤类型中的基因组改变,其中一些改变比其他改变更具可预测性。我们提出基因聚集作为一种工具,以提高基于基因表达谱的改变预测模型的准确性。最终,我们展示了这一原理如何在本质上有噪声的数据集中发挥作用,例如基于单细胞测序的数据集。