Department of Computer Science and Engineering, Incheon National University, Incheon, Republic of Korea.
Department of Computer Science, Yonsei University, Seoul, Republic of Korea.
Sci Rep. 2021 Jan 11;11(1):439. doi: 10.1038/s41598-020-79889-5.
Machine learning may be a powerful approach to more accurate identification of genes that may serve as prognosticators of cancer outcomes using various types of omics data. However, to date, machine learning approaches have shown limited prediction accuracy for cancer outcomes, primarily owing to small sample numbers and relatively large number of features. In this paper, we provide a description of GVES (Gene Vector for Each Sample), a proposed machine learning model that can be efficiently leveraged even with a small sample size, to increase the accuracy of identification of genes with prognostic value. GVES, an adaptation of the continuous bag of words (CBOW) model, generates vector representations of all genes for all samples by leveraging gene expression and biological network data. GVES clusters samples using their gene vectors, and identifies genes that divide samples into good and poor outcome groups for the prediction of cancer outcomes. Because GVES generates gene vectors for each sample, the sample size effect is reduced. We applied GVES to six cancer types and demonstrated that GVES outperformed existing machine learning methods, particularly for cancer datasets with a small number of samples. Moreover, the genes identified as prognosticators were shown to reside within a number of significant prognostic genetic pathways associated with pancreatic cancer.
机器学习可能是一种强大的方法,可以更准确地识别可能作为癌症结果预测因子的基因,使用各种类型的组学数据。然而,迄今为止,机器学习方法对癌症结果的预测准确性有限,主要是由于样本数量小,特征数量相对较大。在本文中,我们提供了 GVES(每个样本的基因向量)的描述,这是一种拟议的机器学习模型,即使在样本数量较少的情况下,也可以有效地利用它来提高具有预后价值的基因的识别准确性。GVES 是连续袋字(CBOW)模型的一种改编,通过利用基因表达和生物网络数据为所有样本生成所有基因的向量表示。GVES 使用基因向量对样本进行聚类,并识别出将样本分为良好和不良预后组的基因,以预测癌症结果。由于 GVES 为每个样本生成基因向量,因此可以减少样本大小的影响。我们将 GVES 应用于六种癌症类型,并证明 GVES 优于现有的机器学习方法,特别是对于样本数量较少的癌症数据集。此外,被确定为预后标志物的基因被证明位于与胰腺癌相关的一些重要预后遗传途径内。
Comput Methods Programs Biomed. 2019-6-29
Curr Protein Pept Sci. 2020
Expert Rev Precis Med Drug Dev. 2024
Int J Mol Sci. 2023-3-29
Theranostics. 2022
Genes (Basel). 2018-10-2
Bioinformatics. 2018-5-1
Cold Spring Harb Perspect Med. 2018-9-4
Bioinformatics. 2017-11-15
Nucleic Acids Res. 2017-1-4
Contemp Oncol (Pozn). 2015