Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China; College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China.
College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China.
Comput Biol Chem. 2019 Jun;80:159-167. doi: 10.1016/j.compbiolchem.2019.03.019. Epub 2019 Apr 2.
Since the driver pathway in cancer plays a crucial role in the formation and progression of cancer, it is very imperative to identify driver pathways, which will offer important information for precision medicine or personalized medicine. In this paper, an improved maximum weight submatrix problem model is proposed by integrating such three kinds of omics data as somatic mutations, copy number variations, and gene expressions. The model tries to adjust coverage and mutual exclusivity with the average weight of genes in a pathway, and simultaneously considers the correlation among genes, so that the pathway having high coverage but moderate mutual exclusivity can be identified. By introducing a kind of short chromosome code and a greedy based recombination operator, a parthenogenetic algorithm PGA-MWS is presented to solve the model. Experimental comparisons among algorithms GA, MOGA, iMCMC and PGA-MWS were performed on biological and simulated data sets. The experimental results show that, compared with the other three algorithms, the PGA-MWS one based on the improved model can identify the gene sets with high coverage but moderate mutual exclusivity and scales well. Many of the identified gene sets are involved in known signaling pathways, most of the implicated genes are oncogenes or tumor suppressors previously reported in literatures. The experimental results indicate that the proposed approach may become a useful complementary tool for detecting cancer pathways.
由于癌症中的驱动途径在癌症的形成和进展中起着至关重要的作用,因此识别驱动途径非常重要,这将为精准医学或个性化医学提供重要信息。在本文中,通过整合体细胞突变、拷贝数变异和基因表达等三种组学数据,提出了一种改进的最大权重子矩阵问题模型。该模型试图通过调整通路中基因的平均权重来调整覆盖率和互斥性,同时考虑基因之间的相关性,从而可以识别具有高覆盖率但适度互斥性的通路。通过引入一种短染色体编码和基于贪婪的重组算子,提出了一种单亲遗传算法 PGA-MWS 来求解该模型。在生物和模拟数据集上,对算法 GA、MOGA、iMCMC 和 PGA-MWS 进行了实验比较。实验结果表明,与其他三种算法相比,基于改进模型的 PGA-MWS 算法能够识别具有高覆盖率但适度互斥性且可扩展的基因集。所识别的基因集大多涉及已知的信号通路,所涉及的大多数基因是文献中先前报道的致癌基因或肿瘤抑制基因。实验结果表明,该方法可能成为检测癌症途径的有用补充工具。