Fan Shicai, Tang Jianxiong, Li Nan, Zhao Ying, Ai Rizi, Zhang Kai, Wang Mengchi, Du Wei, Wang Wei
1School of Automation Engineering, University of Electronic Science and Technology of China, 611731 Chengdu, Sichuan China.
2Center for Informational Biology, University of Electronic Science and Technology of China, 611731 Chengdu, Sichuan China.
NPJ Genom Med. 2019 Feb 1;4:2. doi: 10.1038/s41525-019-0077-8. eCollection 2019.
The integration of genomic and DNA methylation data has been demonstrated as a powerful strategy in understanding cancer mechanisms and identifying therapeutic targets. The TCGA consortium has mapped DNA methylation in thousands of cancer samples using Illumina Infinium Human Methylation 450 K BeadChip (Illumina 450 K array) that only covers about 1.5% of CpGs in the human genome. Therefore, increasing the coverage of the DNA methylome would significantly leverage the usage of the TCGA data. Here, we present a new model called EAGLING that can expand the Illumina 450 K array data 18 times to cover about 30% of the CpGs in the human genome. We applied it to analyze 13 cancers in TCGA. By integrating the expanded methylation, gene expression, and somatic mutation data, we identified the genes showing differential patterns in each of the 13 cancers. Many of the triple-evidenced genes identified in majority of the cancers are biomarkers or potential biomarkers. Pan-cancer analysis also revealed the pathways in which the triple-evidenced genes are enriched, which include well known ones as well as new ones, such as axonal guidance signaling pathway and pathways related to inflammatory processing or inflammation response. Triple-evidenced genes, particularly TNXB, RRM2, CELSR3, SLC16A3, FANCI, MMP9, MMP11, SIK1, and TRIM59 showed superior predictive power in both tumor diagnosis and prognosis. These results have demonstrated that the integrative analysis using the expanded methylation data is powerful in identifying critical genes/pathways that may serve as new therapeutic targets.
基因组和DNA甲基化数据的整合已被证明是理解癌症机制和识别治疗靶点的有力策略。TCGA联盟使用Illumina Infinium Human Methylation 450K BeadChip(Illumina 450K芯片)对数千个癌症样本进行了DNA甲基化图谱绘制,该芯片仅覆盖人类基因组中约1.5%的CpG位点。因此,增加DNA甲基化组的覆盖范围将显著提升TCGA数据的利用价值。在此,我们提出了一种名为EAGLING的新模型,它可以将Illumina 450K芯片数据扩展18倍,以覆盖人类基因组中约30%的CpG位点。我们将其应用于分析TCGA中的13种癌症。通过整合扩展后的甲基化、基因表达和体细胞突变数据,我们确定了在这13种癌症中每种癌症都呈现出差异模式的基因。在大多数癌症中鉴定出的许多三重证据基因都是生物标志物或潜在的生物标志物。泛癌分析还揭示了三重证据基因富集的通路,其中包括知名通路以及新的通路,如轴突导向信号通路和与炎症处理或炎症反应相关的通路。三重证据基因,特别是TNXB、RRM2、CELSR3、SLC16A3、FANCI、MMP9、MMP11、SIK1和TRIM59在肿瘤诊断和预后方面均显示出卓越的预测能力。这些结果表明,使用扩展后的甲基化数据进行综合分析在识别可能作为新治疗靶点的关键基因/通路方面具有强大作用。