Karchin Rachel, Monteiro Alvaro N A, Tavtigian Sean V, Carvalho Marcelo A, Sali Andrej
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.
PLoS Comput Biol. 2007 Feb 16;3(2):e26. doi: 10.1371/journal.pcbi.0030026. Epub 2006 Dec 28.
Many individuals tested for inherited cancer susceptibility at the BRCA1 gene locus are discovered to have variants of unknown clinical significance (UCVs). Most UCVs cause a single amino acid residue (missense) change in the BRCA1 protein. They can be biochemically assayed, but such evaluations are time-consuming and labor-intensive. Computational methods that classify and suggest explanations for UCV impact on protein function can complement functional tests. Here we describe a supervised learning approach to classification of BRCA1 UCVs. Using a novel combination of 16 predictive features, the algorithms were applied to retrospectively classify the impact of 36 BRCA1 C-terminal (BRCT) domain UCVs biochemically assayed to measure transactivation function and to blindly classify 54 documented UCVs. Majority vote of three supervised learning algorithms is in agreement with the assay for more than 94% of the UCVs. Two UCVs found deleterious by both the assay and the classifiers reveal a previously uncharacterized putative binding site. Clinicians may soon be able to use computational classifiers such as those described here to better inform patients. These classifiers can be adapted to other cancer susceptibility genes and systematically applied to prioritize the growing number of potential causative loci and variants found by large-scale disease association studies.
许多在BRCA1基因位点进行遗传性癌症易感性检测的个体被发现携带临床意义不明的变异(UCV)。大多数UCV会导致BRCA1蛋白中的单个氨基酸残基(错义)发生变化。它们可以通过生化方法进行检测,但此类评估既耗时又费力。对UCV对蛋白质功能的影响进行分类并给出解释的计算方法可以补充功能测试。在此,我们描述了一种用于BRCA1 UCV分类的监督学习方法。利用16种预测特征的新组合,这些算法被应用于对36个经生化检测以测量反式激活功能的BRCA1 C末端(BRCT)结构域UCV的影响进行回顾性分类,并对54个已记录的UCV进行盲法分类。三种监督学习算法的多数投票结果与超过94%的UCV的检测结果一致。通过检测和分类器均发现有害的两个UCV揭示了一个以前未被表征的假定结合位点。临床医生可能很快就能使用本文所述的计算分类器来更好地为患者提供信息。这些分类器可以适用于其他癌症易感基因,并系统地应用于对大规模疾病关联研究发现的越来越多的潜在致病位点和变异进行优先级排序。