Kim Minseon, Oh Ilhwan, Ahn Jaegyoon
Department of Computer Science and Engineering, Incheon National University, Incheon 22012, Korea.
Genes (Basel). 2018 Oct 2;9(10):478. doi: 10.3390/genes9100478.
Accurate identification of prognostic biomarkers is an important yet challenging goal in bioinformatics. Many bioinformatics approaches have been proposed for this purpose, but there is still room for improvement. In this paper, we propose a novel machine learning-based method for more accurate identification of prognostic biomarker genes and use them for prediction of cancer prognosis. The proposed method specifies the candidate prognostic gene module by graph learning using the generative adversarial networks (GANs) model, and scores genes using a PageRank algorithm. We applied the proposed method to multiple-omics data that included copy number, gene expression, DNA methylation, and somatic mutation data for five cancer types. The proposed method showed better prediction accuracy than did existing methods. We identified many prognostic genes and their roles in their biological pathways. We also showed that the genes identified from different omics data were complementary, which led to improved accuracy in prediction using multi-omics data.
准确识别预后生物标志物是生物信息学中一个重要但具有挑战性的目标。为此已经提出了许多生物信息学方法,但仍有改进的空间。在本文中,我们提出了一种基于机器学习的新方法,用于更准确地识别预后生物标志物基因,并将其用于预测癌症预后。所提出的方法通过使用生成对抗网络(GANs)模型的图学习来指定候选预后基因模块,并使用PageRank算法对基因进行评分。我们将所提出的方法应用于多种组学数据,这些数据包括五种癌症类型的拷贝数、基因表达、DNA甲基化和体细胞突变数据。所提出的方法显示出比现有方法更好的预测准确性。我们识别出了许多预后基因及其在生物途径中的作用。我们还表明,从不同组学数据中识别出的基因具有互补性,这导致使用多组学数据进行预测时准确性提高。