Jia Peilin, Zhao Zhongming
Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.
Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America ; Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America.
PLoS Comput Biol. 2014 Feb 6;10(2):e1003460. doi: 10.1371/journal.pcbi.1003460. eCollection 2014 Feb.
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.
解读通过下一代测序(NGS)识别出的大量突变数据面临的一个主要挑战是区分驱动突变和中性乘客突变,以促进可靶向基因和新药的识别。当前方法主要基于单基因的突变频率,这缺乏检测罕见突变驱动基因的能力,并且忽略了癌症基因之间的功能互联和调控。我们提出了一种新的突变网络方法VarWalker,用于在大规模癌症突变数据中对驱动基因进行优先级排序。VarWalker基于样本特异性突变谱为每个样本拟合广义相加模型,并基于突变基因及其紧密相互作用因子的联合频率构建模型。这些相互作用因子在蛋白质-蛋白质相互作用网络中使用带重启的随机游走算法进行选择和优化。我们将该方法应用于两个大规模NGS基准数据集中的300多个肿瘤基因组:183个肺腺癌样本和121个黑色素瘤样本。在每种癌症中,我们得出了一个包含显著富集的一致性癌症基因和癌症相关功能通路的一致性突变子网。然后使用每种癌症的独立数据集对这些癌症特异性突变网络进行验证。重要的是,VarWalker对众所周知的、罕见突变的基因进行了优先级排序,这些基因显示与高度频繁突变的基因相互作用,但被传统的基于单基因的方法所忽略。利用VarWalker,我们证明了网络辅助方法可以有效地用于促进在NGS数据中检测癌症驱动基因。