Zhang Wei, Xiang Xiaowen, Zhao Bihai, Huang Jianlin, Yang Lan, Zeng Yifu
College of Computer Science and Engineering, Changsha University, Changsha 410022, China.
Hunan Province Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha 410022, China.
Entropy (Basel). 2023 May 24;25(6):841. doi: 10.3390/e25060841.
Identifying the driver genes of cancer progression is of great significance in improving our understanding of the causes of cancer and promoting the development of personalized treatment. In this paper, we identify the driver genes at the pathway level via an existing intelligent optimization algorithm, named the Mouth Brooding Fish (MBF) algorithm. Many methods based on the maximum weight submatrix model to identify driver pathways attach equal importance to coverage and exclusivity and assign them equal weight, but those methods ignore the impact of mutational heterogeneity. Here, we use principal component analysis (PCA) to incorporate covariate data to reduce the complexity of the algorithm and construct a maximum weight submatrix model considering different weights of coverage and exclusivity. Using this strategy, the unfavorable effect of mutational heterogeneity is overcome to some extent. Data involving lung adenocarcinoma and glioblastoma multiforme were tested with this method and the results compared with the MDPFinder, Dendrix, and Mutex methods. When the driver pathway size was 10, the recognition accuracy of the MBF method reached 80% in both datasets, and the weight values of the submatrix were 1.7 and 1.89, respectively, which are better than those of the compared methods. At the same time, in the signal pathway enrichment analysis, the important role of the driver genes identified by our MBF method in the cancer signaling pathway is revealed, and the validity of these driver genes is demonstrated from the perspective of their biological effects.
识别癌症进展的驱动基因对于增进我们对癌症病因的理解以及推动个性化治疗的发展具有重要意义。在本文中,我们通过一种现有的智能优化算法——口孵鱼(MBF)算法,在通路水平上识别驱动基因。许多基于最大权重子矩阵模型来识别驱动通路的方法对覆盖度和排他性同等重视并赋予它们相同的权重,但这些方法忽略了突变异质性的影响。在此,我们使用主成分分析(PCA)纳入协变量数据以降低算法的复杂度,并构建一个考虑覆盖度和排他性不同权重的最大权重子矩阵模型。采用这种策略,在一定程度上克服了突变异质性的不利影响。使用涉及肺腺癌和多形性胶质母细胞瘤的数据对该方法进行测试,并将结果与MDPFinder、Dendrix和Mutex方法进行比较。当驱动通路大小为10时,MBF方法在两个数据集中的识别准确率均达到80%,子矩阵的权重值分别为1.7和1.89,优于所比较的方法。同时,在信号通路富集分析中,揭示了我们的MBF方法识别出的驱动基因在癌症信号通路中的重要作用,并从其生物学效应的角度证明了这些驱动基因的有效性。