Center for Computational Natural Sciences and Bioinformatics, IIIT Hyderabad, Hyderabad, 500032, India.
Mol Genet Genomics. 2020 May;295(3):807-824. doi: 10.1007/s00438-020-01664-y. Epub 2020 Mar 17.
Patterns of DNA methylation are significantly altered in cancers. Interpreting the functional consequences of DNA methylation requires the integration of multiple forms of data. The recent advancement in the next-generation sequencing can help to decode this relationship and in biomarker discovery. In this study, we investigated the methylation patterns of papillary renal cell carcinoma (PRCC) and its relationship with the gene expression using The Cancer Genome Atlas (TCGA) multi-omics data. We found that the promoter and body of tumor suppressor genes, microRNAs and gene clusters and families, including cadherins, protocadherins, claudins and collagens, are hypermethylated in PRCC. Hypomethylated genes in PRCC are associated with the immune function. The gene expression of several novel candidate genes, including interleukin receptor IL17RE and immune checkpoint genes HHLA2, SIRPA and HAVCR2, shows a significant correlation with DNA methylation. We also developed machine learning models using features extracted from single and multi-omics data to distinguish early and late stages of PRCC. A comparative study of different feature selection algorithms, predictive models, data integration techniques and representations of methylation data was performed. Integration of both gene expression and DNA methylation features improved the performance of models in distinguishing tumor stages. In summary, our study identifies PRCC driver genes and proposes predictive models based on both DNA methylation and gene expression. These results on PRCC will aid in targeted experiments and provide a strategy to improve the classification accuracy of tumor stages.
癌症中 DNA 甲基化模式显著改变。为了解释 DNA 甲基化的功能后果,需要整合多种形式的数据。新一代测序的最新进展可以帮助解码这种关系,并发现生物标志物。在这项研究中,我们使用癌症基因组图谱(TCGA)多组学数据,研究了乳头状肾细胞癌(PRCC)的甲基化模式及其与基因表达的关系。我们发现,肿瘤抑制基因、microRNA 及其基因簇和家族(包括钙黏蛋白、原钙黏蛋白、紧密连接蛋白和胶原蛋白)的启动子和主体在 PRCC 中呈超甲基化。PRCC 中低甲基化的基因与免疫功能有关。几个新候选基因的基因表达,包括白细胞介素受体 IL17RE 和免疫检查点基因 HHLA2、SIRPA 和 HAVCR2,与 DNA 甲基化呈显著相关。我们还使用从单组学和多组学数据中提取的特征开发了用于区分 PRCC 早期和晚期的机器学习模型。对不同的特征选择算法、预测模型、数据集成技术和甲基化数据表示进行了比较研究。整合基因表达和 DNA 甲基化特征可提高模型区分肿瘤阶段的性能。总之,我们的研究确定了 PRCC 的驱动基因,并提出了基于 DNA 甲基化和基因表达的预测模型。这些关于 PRCC 的结果将有助于靶向实验,并提供提高肿瘤分期分类准确性的策略。