Wang Jing, Zhang Yang, Shen Xiaopei, Zhu Jing, Zhang Lin, Zou Jinfeng, Guo Zheng
Bioinformatics Centre and Key Laboratory for NeuroInfomation of the Education Ministry of China, School of Life Science, University of Electronic Science and Technology of China, Chengdu 610054, China.
Mol Biosyst. 2011 Apr;7(4):1158-66. doi: 10.1039/c0mb00211a. Epub 2011 Jan 28.
Finding candidate cancer genes playing causal roles in carcinogenesis is an important task in cancer research. The non-randomness of the co-mutation of genes in cancer samples can provide statistical evidence for these genes' involvement in carcinogenesis. It can also provide important information on the functional cooperation of gene mutations in cancer. However, due to the relatively small sample sizes used in current high-throughput somatic mutation screening studies and the extraordinary large-scale hypothesis tests, the statistical power of finding co-mutated gene pairs based on high-throughput somatic mutation data of cancer genomes is very low. Thus, we proposed a stratified FDR (False Discovery Rate) control approach, for identifying significantly co-mutated gene pairs according to the mutation frequency of genes. We then compared the identified co-mutated gene pairs separately by pre-selecting genes with higher mutation frequencies and by the stratified FDR control approach. Finally, we searched for pairs of pathways annotated with significantly more between-pathway co-mutated gene pairs to evaluate the functional roles of the identified co-mutated gene pairs. Based on two datasets of somatic mutations in cancer genomes, we demonstrated that, at a given FDR level, the power of finding co-mutated gene pairs could be increased by pre-selecting genes with higher mutation frequencies. However, many true co-mutation between genes with lower mutation rates will still be missed. By the stratified FDR control approach, many more co-mutated gene pairs could be found. Finally, the identified pathway pairs significantly overrepresented with between-pathway co-mutated gene pairs suggested that their co-dysregulations may play causal roles in carcinogenesis. The stratified FDR control strategy is efficient in identifying co-mutated gene pairs and the genes in the identified co-mutated gene pairs can be considered as candidate cancer genes because their non-random co-mutations in cancer genomes are highly unlikely to be attributable to chance.
寻找在致癌过程中起因果作用的候选癌症基因是癌症研究中的一项重要任务。癌症样本中基因共突变的非随机性可为这些基因参与致癌过程提供统计学证据。它还可以提供有关癌症中基因突变功能协作的重要信息。然而,由于当前高通量体细胞突变筛查研究中使用的样本量相对较小,以及大规模假设检验的规模极大,基于癌症基因组高通量体细胞突变数据寻找共突变基因对的统计功效非常低。因此,我们提出了一种分层错误发现率(False Discovery Rate,FDR)控制方法,用于根据基因的突变频率识别显著共突变的基因对。然后,我们分别通过预先选择具有较高突变频率的基因和分层FDR控制方法来比较识别出的共突变基因对。最后,我们搜索注释有显著更多通路间共突变基因对的通路对,以评估识别出的共突变基因对的功能作用。基于两个癌症基因组体细胞突变数据集,我们证明,在给定的FDR水平下,通过预先选择具有较高突变频率的基因,可以提高寻找共突变基因对的功效。然而,许多低突变率基因之间的真正共突变仍会被遗漏。通过分层FDR控制方法,可以发现更多的共突变基因对。最后,识别出的通路对中通路间共突变基因对显著富集,这表明它们的共同失调可能在致癌过程中起因果作用。分层FDR控制策略在识别共突变基因对方面是有效的,并且识别出的共突变基因对中的基因可被视为候选癌症基因,因为它们在癌症基因组中的非随机共突变极不可能是偶然造成的。