Azevedo Luisa, Mort Matthew, Costa Antonio C, Silva Raquel M, Quelhas Dulce, Amorim Antonio, Cooper David N
Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Population Genetics and Evolution Group, Porto, Portugal.
IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Porto, Portugal.
Eur J Hum Genet. 2016 Jan;25(1):2-7. doi: 10.1038/ejhg.2016.129. Epub 2016 Oct 5.
Understanding the functional sequelae of amino-acid replacements is of fundamental importance in medical genetics. Perhaps, the most intuitive way to assess the potential pathogenicity of a given human missense variant is by measuring the degree of evolutionary conservation of the substituted amino-acid residue, a feature that generally serves as a good proxy metric for the functional/structural importance of that residue. However, the presence of putatively compensated variants as the wild-type alleles in orthologous proteins of other mammalian species not only challenges this classical view of amino-acid essentiality but also precludes the accurate evaluation of the functional impact of this type of missense variant using currently available bioinformatic prediction tools. Compensated variants constitute at least 4% of all known missense variants causing human-inherited disease and hence represent an important potential source of error in that they are likely to be disproportionately misclassified as benign variants. The consequent under-reporting of compensated variants is exacerbated in the context of next-generation sequencing where their inappropriate exclusion constitutes an unfortunate natural consequence of the filtering and prioritization of the very large number of variants generated. Here we demonstrate the reduced performance of currently available pathogenicity prediction tools when applied to compensated variants and propose an alternative machine-learning approach to assess likely pathogenicity for this particular type of variant.
了解氨基酸替换的功能后遗症在医学遗传学中至关重要。也许,评估给定人类错义变体潜在致病性的最直观方法是测量被取代氨基酸残基的进化保守程度,这一特征通常可作为该残基功能/结构重要性的良好替代指标。然而,在其他哺乳动物物种的直系同源蛋白中存在作为野生型等位基因的推定补偿变体,这不仅挑战了这种关于氨基酸必要性的传统观点,还排除了使用当前可用的生物信息学预测工具准确评估这类错义变体功能影响的可能性。补偿变体至少占所有已知导致人类遗传疾病的错义变体的4%,因此是一个重要的潜在误差来源,因为它们很可能被不成比例地误分类为良性变体。在下一代测序的背景下,补偿变体报告不足的情况更加严重,因为对大量生成的变体进行过滤和排序时,它们被不恰当地排除是一个不幸的自然结果。在此,我们证明了当前可用的致病性预测工具应用于补偿变体时性能会降低,并提出了一种替代的机器学习方法来评估这类特定变体的可能致病性。