Department of Computer Science and Engineering, Incheon National University, Incheon, The Republic of Korea.
Department of Computer Science, Yonsei University, Seoul, The Republic of Korea.
Bioinformatics. 2017 Nov 15;33(22):3619-3626. doi: 10.1093/bioinformatics/btx487.
MOTIVATION: Identification of genes that can be used to predict prognosis in patients with cancer is important in that it can lead to improved therapy, and can also promote our understanding of tumor progression on the molecular level. One of the common but fundamental problems that render identification of prognostic genes and prediction of cancer outcomes difficult is the heterogeneity of patient samples. RESULTS: To reduce the effect of sample heterogeneity, we clustered data samples using K-means algorithm and applied modified PageRank to functional interaction (FI) networks weighted using gene expression values of samples in each cluster. Hub genes among resulting prioritized genes were selected as biomarkers to predict the prognosis of samples. This process outperformed traditional feature selection methods as well as several network-based prognostic gene selection methods when applied to Random Forest. We were able to find many cluster-specific prognostic genes for each dataset. Functional study showed that distinct biological processes were enriched in each cluster, which seems to reflect different aspect of tumor progression or oncogenesis among distinct patient groups. Taken together, these results provide support for the hypothesis that our approach can effectively identify heterogeneous prognostic genes, and these are complementary to each other, improving prediction accuracy. AVAILABILITY AND IMPLEMENTATION: https://github.com/mathcom/CPR. CONTACT: jgahn@inu.ac.kr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
动机:识别可用于预测癌症患者预后的基因非常重要,因为它可以导致治疗效果的改善,也可以促进我们对肿瘤在分子水平上进展的理解。使预后基因的识别和癌症结果的预测变得困难的一个常见但基本的问题是患者样本的异质性。
结果:为了减少样本异质性的影响,我们使用 K-均值算法对数据样本进行聚类,并应用改进的 PageRank 对使用每个聚类中样本的基因表达值加权的功能相互作用(FI)网络进行处理。从优先基因中选择的枢纽基因作为生物标志物来预测样本的预后。当应用于随机森林时,该过程优于传统的特征选择方法以及几种基于网络的预后基因选择方法。我们能够为每个数据集找到许多特定于聚类的预后基因。功能研究表明,每个聚类中都富集了不同的生物学过程,这似乎反映了不同患者群体中肿瘤进展或致癌作用的不同方面。总之,这些结果支持了我们的方法可以有效地识别异质预后基因的假设,并且它们相互补充,提高了预测准确性。
可用性和实现:https://github.com/mathcom/CPR.
联系方式:jgahn@inu.ac.kr.
补充信息:补充数据可在生物信息学在线获得。
Bioinformatics. 2017-11-15
IEEE/ACM Trans Comput Biol Bioinform. 2021
BMC Syst Biol. 2011-10-12
Bioinformatics. 2018-2-1
Biomed Res Int. 2013-9-1
Cold Spring Harb Mol Case Stud. 2019-8-1
IEEE/ACM Trans Comput Biol Bioinform. 2021
Genes (Basel). 2019-1-28
Genes (Basel). 2018-10-2
Asian Pac J Cancer Prev. 2018-7-27