Center for Human Genetics, Inc. Cambridge, Massachusetts.
Department of Pathology and Laboratory Medicine, Nationwide Children's Hospital Columbus, Ohio.
Mol Genet Genomic Med. 2015 Mar;3(2):99-110. doi: 10.1002/mgg3.116. Epub 2014 Dec 3.
Current practice by clinical diagnostic laboratories is to utilize online prediction programs to help determine the significance of novel variants in a given gene sequence. However, these programs vary widely in their methods and ability to correctly predict the pathogenicity of a given sequence change. The performance of 17 publicly available pathogenicity prediction programs was assayed using a dataset consisting of 122 credibly pathogenic and benign variants in genes associated with the RASopathy family of disorders and limb-girdle muscular dystrophy. Performance metrics were compared between the programs to determine the most accurate program for loss-of-function and gain-of-function mechanisms. No one program correctly predicted the pathogenicity of all variants analyzed. A major hindrance to the analysis was the lack of output from a significant portion of the programs. The best performer was MutPred, which had a weighted accuracy of 82.6% in the full dataset. Surprisingly, combining the results of the top three programs did not increase the ability to predict pathogenicity over the top performer alone. As the increasing number of sequence changes in larger datasets will require interpretation, the current study demonstrates that extreme caution must be taken when reporting pathogenicity based on statistical online protein prediction programs in the absence of functional studies.
当前,临床诊断实验室的实践是利用在线预测程序来帮助确定给定基因序列中新变体的意义。然而,这些程序在方法和正确预测给定序列变化的致病性方面差异很大。使用包含与 RASopathy 家族疾病和肢带型肌营养不良症相关基因中的 122 个可信致病性和良性变体的数据集,评估了 17 个公开可用的致病性预测程序的性能。比较了程序之间的性能指标,以确定用于功能丧失和功能获得机制的最准确程序。没有一个程序能够正确预测所有分析变体的致病性。分析的一个主要障碍是相当一部分程序缺乏输出。表现最好的是 MutPred,它在完整数据集中的加权准确率为 82.6%。令人惊讶的是,将前三个程序的结果相结合并不能提高预测致病性的能力,超过表现最好的程序。随着更大数据集的序列变化数量的增加,需要进行解释,因此在缺乏功能研究的情况下,基于统计在线蛋白质预测程序报告致病性时必须非常谨慎。