School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
Department of Mathematics, Weifang University, Weifang, 261061, Shandong, China.
BMC Bioinformatics. 2022 Jul 13;23(1):277. doi: 10.1186/s12859-022-04788-7.
Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few cancer driver genes whose mutations cause tumor growth. However, the majority of existing computational approaches underuse the co-occurrence mutation information of the individuals, which are deemed to be important in tumorigenesis and tumor progression, resulting in high rate of false positive.
To make full use of co-mutation information, we present a random walk algorithm referred to as DriverRWH on a weighted gene mutation hypergraph model, using somatic mutation data and molecular interaction network data to prioritize candidate driver genes. Applied to tumor samples of different cancer types from The Cancer Genome Atlas, DriverRWH shows significantly better performance than state-of-art prioritization methods in terms of the area under the curve scores and the cumulative number of known driver genes recovered in top-ranked candidate genes. Besides, DriverRWH discovers several potential drivers, which are enriched in cancer-related pathways. DriverRWH recovers approximately 50% known driver genes in the top 30 ranked candidate genes for more than half of the cancer types. In addition, DriverRWH is also highly robust to perturbations in the mutation data and gene functional network data.
DriverRWH is effective among various cancer types in prioritizes cancer driver genes and provides considerable improvement over other tools with a better balance of precision and sensitivity. It can be a useful tool for detecting potential driver genes and facilitate targeted cancer therapies.
下一代测序技术的最新进展帮助研究人员生成了大量的癌症基因组数据。癌症基因组学的一个关键挑战是确定少数导致肿瘤生长的癌症驱动基因。然而,大多数现有的计算方法都没有充分利用个体的共突变信息,这些信息被认为在肿瘤发生和肿瘤进展中很重要,导致假阳性率很高。
为了充分利用共突变信息,我们提出了一种随机游走算法,称为 DriverRWH,它基于加权基因突变超图模型,利用体细胞突变数据和分子相互作用网络数据对候选驱动基因进行优先级排序。将 DriverRWH 应用于来自癌症基因组图谱的不同癌症类型的肿瘤样本,在曲线下面积得分和前 30 名候选基因中恢复的已知驱动基因数量方面,DriverRWH 显著优于最先进的优先级排序方法。此外,DriverRWH 发现了一些潜在的驱动基因,它们在癌症相关途径中富集。DriverRWH 在超过一半的癌症类型中,在前 30 名候选基因中恢复了大约 50%的已知驱动基因。此外,DriverRWH 对突变数据和基因功能网络数据的干扰也具有很高的鲁棒性。
DriverRWH 在各种癌症类型中对癌症驱动基因的优先级排序是有效的,并且在精确性和敏感性之间取得了更好的平衡,优于其他工具。它可以作为检测潜在驱动基因的有用工具,并促进针对癌症的治疗。