Xu Xiaolu, Qi Zitong, Zhang Dawei, Zhang Meiwei, Ren Yonggong, Geng Zhaohong
School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China.
Department of Statistics, University of Washington, Seattle, WA 98195, USA.
Comput Struct Biotechnol J. 2023 May 26;21:3124-3135. doi: 10.1016/j.csbj.2023.05.019. eCollection 2023.
Although computational methods for driver gene identification have progressed rapidly, it is far from the goal of obtaining widely recognized driver genes for all cancer types. The driver gene lists predicted by these methods often lack consistency and stability across different studies or datasets. In addition to analytical performance, some tools may require further improvement regarding operability and system compatibility. Here, we developed a user-friendly R package (DriverGenePathway) integrating MutSigCV and statistical methods to identify cancer driver genes and pathways. The theoretical basis of the MutSigCV program is elaborated and integrated into DriverGenePathway, such as mutation categories discovery based on information entropy. Five methods of hypothesis testing, including the beta-binomial test, Fisher combined -value test, likelihood ratio test, convolution test, and projection test, are used to identify the minimal core driver genes. Moreover, de novo methods, which can effectively overcome mutational heterogeneity, are introduced to identify driver pathways. Herein, we describe the computational structure and statistical fundamentals of the DriverGenePathway pipeline and demonstrate its performance using eight types of cancer from TCGA. DriverGenePathway correctly confirms many expected driver genes with high overlap with the Cancer Gene Census list and driver pathways associated with cancer development. The DriverGenePathway R package is freely available on GitHub: https://github.com/bioinformatics-xu/DriverGenePathway.
尽管用于识别驱动基因的计算方法发展迅速,但距离获得所有癌症类型中被广泛认可的驱动基因这一目标仍相去甚远。这些方法预测的驱动基因列表在不同研究或数据集中往往缺乏一致性和稳定性。除了分析性能外,一些工具在可操作性和系统兼容性方面可能还需要进一步改进。在此,我们开发了一个用户友好的R包(DriverGenePathway),它整合了MutSigCV和统计方法来识别癌症驱动基因和通路。详细阐述了MutSigCV程序的理论基础并将其整合到DriverGenePathway中,比如基于信息熵的突变类别发现。使用包括β - 二项式检验、Fisher合并P值检验、似然比检验、卷积检验和投影检验在内的五种假设检验方法来识别最小核心驱动基因。此外,还引入了能够有效克服突变异质性的从头开始的方法来识别驱动通路。在此,我们描述了DriverGenePathway流程的计算结构和统计基础,并使用来自TCGA的八种癌症类型展示了其性能。DriverGenePathway正确地确认了许多预期的驱动基因,与癌症基因普查列表有高度重叠,以及与癌症发展相关的驱动通路。DriverGenePathway R包可在GitHub上免费获取:https://github.com/bioinformatics-xu/DriverGenePathway。