Kaminker Joshua S, Zhang Yan, Watanabe Colin, Zhang Zemin
Department of Bioinformatics, Genentech, Inc., South San Francisco, CA 94080, USA.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W595-8. doi: 10.1093/nar/gkm405. Epub 2007 May 30.
Various cancer genome projects are underway to identify novel mutations that drive tumorigenesis. While these screens will generate large data sets, the majority of identified missense changes are likely to be innocuous passenger mutations or polymorphisms. As a result, it has become increasingly important to develop computational methods for distinguishing functionally relevant mutations from other variations. We previously developed an algorithm, and now present the web application, CanPredict (http://www.canpredict.org/ or http://www.cgl.ucsf.edu/Research/genentech/canpredict/), to allow users to determine if particular changes are likely to be cancer-associated. The impact of each change is measured using two known methods: Sorting Intolerant From Tolerant (SIFT) and the Pfam-based LogR.E-value metric. A third method, the Gene Ontology Similarity Score (GOSS), provides an indication of how closely the gene in which the variant resides resembles other known cancer-causing genes. Scores from these three algorithms are analyzed by a random forest classifier which then predicts whether a change is likely to be cancer-associated. CanPredict fills an important need in cancer biology and will enable a large audience of biologists to determine which mutations are the most relevant for further study.
各种癌症基因组计划正在进行中,以识别驱动肿瘤发生的新突变。虽然这些筛查将产生大量数据集,但大多数已识别的错义变化可能是无害的过客突变或多态性。因此,开发用于区分功能相关突变与其他变异的计算方法变得越来越重要。我们之前开发了一种算法,现在推出了网络应用程序CanPredict(http://www.canpredict.org/ 或 http://www.cgl.ucsf.edu/Research/genentech/canpredict/),以允许用户确定特定变化是否可能与癌症相关。使用两种已知方法来衡量每个变化的影响:从耐受中筛选不耐受(SIFT)和基于Pfam的LogR.E值度量。第三种方法,基因本体相似性评分(GOSS),提供了变异所在基因与其他已知致癌基因的相似程度的指示。这三种算法的分数由随机森林分类器进行分析,然后预测一个变化是否可能与癌症相关。CanPredict满足了癌症生物学中的一项重要需求,并将使广大生物学家能够确定哪些突变与进一步研究最相关。