Yin Yizhou, Kundu Kunal, Pal Lipika R, Moult John
Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland.
Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, Maryland.
Hum Mutat. 2017 Sep;38(9):1109-1122. doi: 10.1002/humu.23267. Epub 2017 Jun 27.
CAGI (Critical Assessment of Genome Interpretation) conducts community experiments to determine the state of the art in relating genotype to phenotype. Here, we report results obtained using newly developed ensemble methods to address two CAGI4 challenges: enzyme activity for population missense variants found in NAGLU (Human N-acetyl-glucosaminidase) and random missense mutations in Human UBE2I (Human SUMO E2 ligase), assayed in a high-throughput competitive yeast complementation procedure. The ensemble methods are effective, ranked second for SUMO-ligase and third for NAGLU, according to the CAGI independent assessors. However, in common with other methods used in CAGI, there are large discrepancies between predicted and experimental activities for a subset of variants. Analysis of the structural context provides some insight into these. Post-challenge analysis shows that the ensemble methods are also effective at assigning pathogenicity for the NAGLU variants. In the clinic, providing an estimate of the reliability of pathogenic assignments is the key. We have also used the NAGLU dataset to show that ensemble methods have considerable potential for this task, and are already reliable enough for use with a subset of mutations.
基因组解释关键评估(CAGI)开展社区实验,以确定基因型与表型关联方面的技术水平。在此,我们报告使用新开发的集成方法获得的结果,以应对CAGI4的两项挑战:在高通量竞争性酵母互补实验中检测的NAGLU(人N - 乙酰葡糖胺酶)中群体错义变体的酶活性以及人UBE2I(人SUMO E2连接酶)中的随机错义突变。根据CAGI独立评估人员的评估,这些集成方法是有效的,在SUMO连接酶方面排名第二,在NAGLU方面排名第三。然而,与CAGI中使用的其他方法一样,对于一部分变体,预测活性和实验活性之间存在很大差异。对结构背景的分析为这些差异提供了一些见解。挑战后分析表明,集成方法在为NAGLU变体确定致病性方面也是有效的。在临床中,提供致病性判定的可靠性估计是关键。我们还使用NAGLU数据集表明,集成方法在这项任务中有很大潜力,并且对于一部分突变已经足够可靠,可以使用。