Cai Lihua, Wu Honglong, Zhou Ke
Wuhan National Laboratory for Optoelectronics, School of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, Hubei, China.
School of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong, China.
PLoS One. 2021 Feb 11;16(2):e0246668. doi: 10.1371/journal.pone.0246668. eCollection 2021.
Identifying biomarkers that are associated with different types of cancer is an important goal in the field of bioinformatics. Different researcher groups have analyzed the expression profiles of many genes and found some certain genetic patterns that can promote the improvement of targeted therapies, but the significance of some genes is still ambiguous. More reliable and effective biomarkers identification methods are then needed to detect candidate cancer-related genes. In this paper, we proposed a novel method that combines the infinite latent feature selection (ILFS) method with the functional interaction (FIs) network to rank the biomarkers. We applied the proposed method to the expression data of five cancer types. The experiments indicated that our network-constrained ILFS (NCILFS) provides an improved prediction of the diagnosis of the samples and locates many more known oncogenes than the original ILFS and some other existing methods. We also performed functional enrichment analysis by inspecting the over-represented gene ontology (GO) biological process (BP) terms and applying the gene set enrichment analysis (GSEA) method on selected biomarkers for each feature selection method. The enrichments analysis reports show that our network-constraint ILFS can produce more biologically significant gene sets than other methods. The results suggest that network-constrained ILFS can identify cancer-related genes with a higher discriminative power and biological significance.
识别与不同类型癌症相关的生物标志物是生物信息学领域的一个重要目标。不同的研究团队分析了许多基因的表达谱,并发现了一些特定的遗传模式,这些模式有助于推动靶向治疗的改进,但某些基因的意义仍不明确。因此,需要更可靠、有效的生物标志物识别方法来检测候选癌症相关基因。在本文中,我们提出了一种将无限潜在特征选择(ILFS)方法与功能相互作用(FIs)网络相结合的新方法,用于对生物标志物进行排序。我们将所提出的方法应用于五种癌症类型的表达数据。实验表明,我们的网络约束ILFS(NCILFS)对样本诊断的预测能力有所提高,并且比原始的ILFS和其他一些现有方法定位到了更多已知的致癌基因。我们还通过检查过度富集的基因本体(GO)生物过程(BP)术语,并对每种特征选择方法选择的生物标志物应用基因集富集分析(GSEA)方法,进行了功能富集分析。富集分析报告显示,我们的网络约束ILFS比其他方法能产生更具生物学意义的基因集。结果表明,网络约束ILFS能够识别出具有更高判别力和生物学意义的癌症相关基因。