Li Gaoshi, Hu Zhipeng, Luo Xinlong, Liu Jiafei, Wu Jingli, Peng Wei, Zhu Xiaoshu
Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China.
Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China.
Health Inf Sci Syst. 2024 Mar 6;12(1):21. doi: 10.1007/s13755-024-00279-6. eCollection 2024 Dec.
Cancer is a complex gene mutation disease that derives from the accumulation of mutations during somatic cell evolution. With the advent of high-throughput technology, a large amount of omics data has been generated, and how to find cancer-related driver genes from a large number of omics data is a challenge. In the early stage, the researchers developed many frequency-based driver genes identification methods, but they could not identify driver genes with low mutation rates well. Afterwards, researchers developed network-based methods by fusing multi-omics data, but they rarely considered the connection among features. In this paper, after analyzing a large number of methods for integrating multi-omics data, a hierarchical weak consensus model for fusing multiple features is proposed according to the connection among features. By analyzing the connection between PPI network and co-mutation hypergraph network, this paper firstly proposes a new topological feature, called co-mutation clustering coefficient (CMCC). Then, a hierarchical weak consensus model is used to integrate CMCC, mRNA and miRNA differential expression scores, and a new driver genes identification method HWC is proposed. In this paper, the HWC method and current 7 state-of-the-art methods are compared on three types of cancers. The comparison results show that HWC has the best identification performance in statistical evaluation index, functional consistency and the partial area under ROC curve.
The online version contains supplementary material available at 10.1007/s13755-024-00279-6.
癌症是一种复杂的基因突变疾病,源于体细胞进化过程中突变的积累。随着高通量技术的出现,产生了大量的组学数据,如何从大量组学数据中找到癌症相关的驱动基因是一项挑战。早期,研究人员开发了许多基于频率的驱动基因识别方法,但它们不能很好地识别低突变率的驱动基因。之后,研究人员通过融合多组学数据开发了基于网络的方法,但他们很少考虑特征之间的联系。本文在分析了大量整合多组学数据的方法后,根据特征之间的联系提出了一种融合多个特征的分层弱共识模型。通过分析蛋白质-蛋白质相互作用(PPI)网络与共突变超图网络之间的联系,本文首先提出了一种新的拓扑特征,称为共突变聚类系数(CMCC)。然后,使用分层弱共识模型整合CMCC、mRNA和miRNA差异表达分数,提出了一种新的驱动基因识别方法HWC。本文在三种癌症类型上比较了HWC方法和当前7种最先进的方法。比较结果表明,HWC在统计评估指标、功能一致性和ROC曲线下部分面积方面具有最佳的识别性能。
在线版本包含可在10.1007/s13755-024-00279-6获取的补充材料。