Pejaver Vikas, Mooney Sean D, Radivojac Predrag
Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana.
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington.
Hum Mutat. 2017 Sep;38(9):1092-1108. doi: 10.1002/humu.23258. Epub 2017 Jun 12.
The steady advances in machine learning and accumulation of biomedical data have contributed to the development of numerous computational models that assess the impact of missense variants. Different methods, however, operationalize impact differently. Two common tasks in this context are the prediction of the pathogenicity of variants and the prediction of their effects on a protein's function. These are related but distinct problems, and it is unclear whether methods developed for one are optimized for the other. The Critical Assessment of Genome Interpretation (CAGI) experiment provides a means to address this question empirically. To this end, we participated in various protein-specific challenges in CAGI with two objectives in mind. First, to compare the performance of methods in the MutPred family with the state-of-the-art. Second and more importantly, to investigate the applicability of general-purpose pathogenicity predictors to the classification of specific function-altering variants without additional training or calibration. We find that our pathogenicity predictors performed competitively with other methods, outputting score distributions in agreement with experimental outcomes. Overall, we conclude that binary classifiers learned from disease-causing mutations are capable of modeling important aspects of the underlying biology and the alteration of protein function resulting from mutations.
机器学习的稳步发展以及生物医学数据的积累,推动了众多评估错义变异影响的计算模型的开发。然而,不同的方法在操作影响方面存在差异。在这种情况下,两个常见的任务是预测变异的致病性及其对蛋白质功能的影响。这两个问题相关但不同,目前尚不清楚为其中一个问题开发的方法是否针对另一个问题进行了优化。基因组解释关键评估(CAGI)实验提供了一种通过实证解决这个问题的方法。为此,我们怀着两个目标参与了CAGI中各种针对特定蛋白质的挑战。首先,将MutPred