Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey.
Hum Mutat. 2019 Sep;40(9):1486-1494. doi: 10.1002/humu.23832. Epub 2019 Jul 3.
The recent years have seen a drastic increase in the amount of available genomic sequences. Alongside this explosion, hundreds of computational tools were developed to assess the impact of observed genetic variation. Critical Assessment of Genome Interpretation (CAGI) provides a platform to evaluate the performance of these tools in experimentally relevant contexts. In the CAGI-5 challenge assessing the 38 missense variants affecting the human Pericentriolar material 1 protein (PCM1), our SNAP-based submission was the top performer, although it did worse than expected from other evaluations. Here, we compare the CAGI-5 submissions, and 24 additional commonly used variant effect predictors, to analyze the reasons for this observation. We identified per residue conservation, structural, and functional PCM1 characteristics, which may be responsible. As expected, predictors had a hard time distinguishing effect variants in nonconserved positions. They were also better able to call effect variants in a structurally rich region than in a less-structured one; in the latter, they more often correctly identified benign than effect variants. Curiously, most of the protein was predicted to be functionally robust to mutation-a feature that likely makes it a harder problem for generalized variant effect predictors.
近年来,可用基因组序列的数量急剧增加。与此同时,开发了数百种计算工具来评估观察到的遗传变异的影响。基因组解读的关键评估(CAGI)提供了一个平台,可在实验相关的上下文中评估这些工具的性能。在评估影响人类中心粒周围物质 1 蛋白(PCM1)的 38 个错义变异的 CAGI-5 挑战中,我们基于 SNAP 的提交是表现最好的,但它的表现不如其他评估预期的那么好。在这里,我们比较了 CAGI-5 提交的结果,以及 24 个额外常用的变体效应预测器,以分析这种观察结果的原因。我们确定了每个残基的保守性、结构和功能 PCM1 特征,这些特征可能是造成这种现象的原因。正如预期的那样,预测器很难区分非保守位置的效应变异。它们在结构丰富的区域比在结构较少的区域更能准确地识别效应变异;在后一种情况下,它们更经常正确地识别良性而非效应变异。奇怪的是,大多数蛋白质被预测为对突变具有很强的功能鲁棒性——这一特征可能使它成为通用变体效应预测器的一个更难的问题。