氨基酸取代的蛋白质特异性和一般致病性预测因子之间的互补性

The Complementarity Between Protein-Specific and General Pathogenicity Predictors for Amino Acid Substitutions.

作者信息

Riera Casandra, Padilla Natàlia, de la Cruz Xavier

机构信息

Research Unit in Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain.

ICREA, Barcelona, Spain.

出版信息

Hum Mutat. 2016 Oct;37(10):1013-24. doi: 10.1002/humu.23048. Epub 2016 Aug 8.

DOI:10.1002/humu.23048

PMID:27397615

Abstract

The usage of next-generation sequencing with biomedical/clinical purposes has fuelled the demand for tools that assess the functional impact of sequence variants. For single amino acid variants, general methods (GM), based on biophysics/evolutionary principles and trained by pooling variants from many proteins, are already available. Until now, their accuracy range (∼80%) has limited their usage in clinical applications. In parallel, a series of studies indicate that protein-specific predictors (PSP), using only information from the protein of interest, could frequently surpass the performance of GM. However, two reasons suggest that this may not always be the case: the existence of a performance threshold affecting both GM and PSP, and the effect of training data scarcity. Here, we characterize the relationship between the two approaches deriving 82 PSP and comparing them with several GM (PolyPhen-2, SIFT, PON-P2, MutationTaster2, CADD). We find a complementary relationship between PSP and GM, with no approach always outperforming the other. However, the relationship varies between two limiting situations, for example, PSP are frequently outperformed by PON-P2, the best GM; however, the opposite happens when we compare PSP and SIFT. Finally, we explore how the observed complementarity could lead to increased success rates in pathogenicity prediction.

摘要

将下一代测序技术用于生物医学/临床目的，激发了对评估序列变异功能影响的工具的需求。对于单氨基酸变异，基于生物物理学/进化原理并通过汇集多种蛋白质的变异进行训练的通用方法（GM）已经存在。到目前为止，它们的准确率范围（约80%）限制了其在临床应用中的使用。与此同时，一系列研究表明，仅使用感兴趣蛋白质信息的蛋白质特异性预测器（PSP）通常可以超越通用方法的性能。然而，有两个原因表明情况可能并非总是如此：存在影响通用方法和蛋白质特异性预测器的性能阈值，以及训练数据稀缺的影响。在这里，我们通过推导82种蛋白质特异性预测器并将它们与几种通用方法（PolyPhen-2、SIFT、PON-P2、MutationTaster2、CADD）进行比较，来描述这两种方法之间的关系。我们发现蛋白质特异性预测器和通用方法之间存在互补关系，没有一种方法总是优于另一种方法。然而，这种关系在两种极限情况下有所不同，例如，蛋白质特异性预测器经常被最好的通用方法PON-P2超越；然而，当我们比较蛋白质特异性预测器和SIFT时，情况则相反。最后，我们探讨观察到的互补性如何能够提高致病性预测的成功率。