全外显子组测序研究中非同义单核苷酸变异有害性预测方法的比较与整合

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

作者信息

Dong Chengliang, Wei Peng, Jian Xueqiu, Gibbs Richard, Boerwinkle Eric, Wang Kai, Liu Xiaoming

机构信息

Zilkha Neurogenetic Institute, Biostatistics Division, Department of Preventive Medicine and.

Human Genetics Center, Division of Biostatistics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA and.

出版信息

Hum Mol Genet. 2015 Apr 15;24(8):2125-37. doi: 10.1093/hmg/ddu733. Epub 2014 Dec 30.

DOI:10.1093/hmg/ddu733

PMID:25552646

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4375422/

Abstract

Accurate deleteriousness prediction for nonsynonymous variants is crucial for distinguishing pathogenic mutations from background polymorphisms in whole exome sequencing (WES) studies. Although many deleteriousness prediction methods have been developed, their prediction results are sometimes inconsistent with each other and their relative merits are still unclear in practical applications. To address these issues, we comprehensively evaluated the predictive performance of 18 current deleteriousness-scoring methods, including 11 function prediction scores (PolyPhen-2, SIFT, MutationTaster, Mutation Assessor, FATHMM, LRT, PANTHER, PhD-SNP, SNAP, SNPs&GO and MutPred), 3 conservation scores (GERP++, SiPhy and PhyloP) and 4 ensemble scores (CADD, PON-P, KGGSeq and CONDEL). We found that FATHMM and KGGSeq had the highest discriminative power among independent scores and ensemble scores, respectively. Moreover, to ensure unbiased performance evaluation of these prediction scores, we manually collected three distinct testing datasets, on which no current prediction scores were tuned. In addition, we developed two new ensemble scores that integrate nine independent scores and allele frequency. Our scores achieved the highest discriminative power compared with all the deleteriousness prediction scores tested and showed low false-positive prediction rate for benign yet rare nonsynonymous variants, which demonstrated the value of combining information from multiple orthologous approaches. Finally, to facilitate variant prioritization in WES studies, we have pre-computed our ensemble scores for 87 347 044 possible variants in the whole-exome and made them publicly available through the ANNOVAR software and the dbNSFP database.

摘要

在全外显子组测序（WES）研究中，准确预测非同义变异的有害性对于区分致病突变和背景多态性至关重要。尽管已经开发了许多有害性预测方法，但它们的预测结果有时相互不一致，并且在实际应用中它们的相对优缺点仍不明确。为了解决这些问题，我们全面评估了18种当前有害性评分方法的预测性能，包括11种功能预测评分（PolyPhen-2、SIFT、MutationTaster、Mutation Assessor、FATHMM、LRT、PANTHER、PhD-SNP、SNAP、SNPs&GO和MutPred）、3种保守性评分（GERP++、SiPhy和PhyloP）以及4种综合评分（CADD、PON-P、KGGSeq和CONDEL）。我们发现，在独立评分和综合评分中，FATHMM和KGGSeq分别具有最高的判别力。此外，为确保对这些预测评分进行无偏的性能评估，我们手动收集了三个不同的测试数据集，目前没有任何预测评分在这些数据集上进行过调整。另外，我们开发了两种新的综合评分，它们整合了9种独立评分和等位基因频率。与所有测试的有害性预测评分相比，我们的评分具有最高的判别力，并且对于良性但罕见的非同义变异显示出较低的假阳性预测率，这证明了整合多种直系同源方法信息的价值。最后，为了便于在WES研究中对变异进行优先级排序，我们已经预先计算了全外显子组中87347044个可能变异的综合评分，并通过ANNOVAR软件和dbNSFP数据库将其公开。

相似文献

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

Hum Mol Genet. 2015 Apr 15;24(8):2125-37. doi: 10.1093/hmg/ddu733. Epub 2014 Dec 30.

REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.

Am J Hum Genet. 2016 Oct 6;99(4):877-885. doi: 10.1016/j.ajhg.2016.08.016. Epub 2016 Sep 22.

dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations.

Hum Mutat. 2013 Sep;34(9):E2393-402. doi: 10.1002/humu.22376. Epub 2013 Jul 10.

dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs.

Hum Mutat. 2016 Mar;37(3):235-41. doi: 10.1002/humu.22932. Epub 2016 Jan 5.

dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs.

Genome Med. 2020 Dec 2;12(1):103. doi: 10.1186/s13073-020-00803-9.

Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies.

PLoS Genet. 2014 Mar 20;10(3):e1004237. doi: 10.1371/journal.pgen.1004237. eCollection 2014 Mar.

PERCH: A Unified Framework for Disease Gene Prioritization.

Hum Mutat. 2017 Mar;38(3):243-251. doi: 10.1002/humu.23158. Epub 2017 Jan 28.

dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions.

Hum Mutat. 2011 Aug;32(8):894-9. doi: 10.1002/humu.21517.

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.

Hum Mutat. 2015 May;36(5):513-23. doi: 10.1002/humu.22768. Epub 2015 Mar 26.

IMHOTEP-a composite score integrating popular tools for predicting the functional consequences of non-synonymous sequence variants.

Nucleic Acids Res. 2017 Feb 17;45(3):e13. doi: 10.1093/nar/gkw886.

引用本文的文献

Prediction of human pathogenic start loss variants based on self-supervised contrastive learning.

BMC Biol. 2025 Aug 8;23(1):250. doi: 10.1186/s12915-025-02348-y.

Exploring Molecular and Phenotypic Characteristics of Arg234Gly and Asp312Asn Variants.

Mol Syndromol. 2025 Aug;16(4):342-353. doi: 10.1159/000542367. Epub 2024 Nov 6.

Health risks and genetic architecture of objectively measured multidimensional sleep health.

Nat Commun. 2025 Jul 31;16(1):7026. doi: 10.1038/s41467-025-62338-0.

Phenotypic and Genotypic Characterization of 171 Patients with Syndromic Inherited Retinal Diseases Highlights the Importance of Genetic Testing for Accurate Clinical Diagnosis.

Genes (Basel). 2025 Jun 26;16(7):745. doi: 10.3390/genes16070745.

Predicting the Damaging Potential of Uncharacterized and Variants.

Int J Mol Sci. 2025 Jul 8;26(14):6561. doi: 10.3390/ijms26146561.

An Integrated Clinical, Germline, Somatic, and In Silico Approach to Assess a Novel PMS2 Gene Variant Identified in Two Unrelated Lynch Syndrome Families.

Cancers (Basel). 2025 Jul 11;17(14):2308. doi: 10.3390/cancers17142308.

Leukemia mutated proteins PHF6 and PHIP form a chromatin complex that represses acute myeloid leukemia stemness.

Genes Dev. 2025 Jul 28. doi: 10.1101/gad.352602.125.

Sequencing validates deep learning models for EHR-based detection of Noonan syndrome in pediatric patients.

NPJ Genom Med. 2025 Jul 21;10(1):56. doi: 10.1038/s41525-025-00512-5.

De Novo Variant Associated With Juvenile-Onset Temporal Lobe Epilepsy With Favorable Outcomes.

Hum Mutat. 2025 Feb 12;2025:9951922. doi: 10.1155/humu/9951922. eCollection 2025.

Insights on SNPs of Human Activation-Induced Cytidine Deaminase AID.

Int J Mol Sci. 2025 Jun 25;26(13):6107. doi: 10.3390/ijms26136107.

本文引用的文献

A general framework for estimating the relative pathogenicity of human genetic variants.

Nat Genet. 2014 Mar;46(3):310-5. doi: 10.1038/ng.2892. Epub 2014 Feb 2.

Improved exome prioritization of disease genes through cross-species phenotype comparison.

Genome Res. 2014 Feb;24(2):340-8. doi: 10.1101/gr.160325.113. Epub 2013 Oct 25.

Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease.

PLoS Genet. 2013;9(10):e1003770. doi: 10.1371/journal.pgen.1003770. Epub 2013 Oct 3.

eXtasy: variant prioritization by genomic data fusion.

Nat Methods. 2013 Nov;10(11):1083-4. doi: 10.1038/nmeth.2656. Epub 2013 Sep 29.

dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations.

Hum Mutat. 2013 Sep;34(9):E2393-402. doi: 10.1002/humu.22376. Epub 2013 Jul 10.

Whole-genome sequence-based analysis of high-density lipoprotein cholesterol.

Nat Genet. 2013 Aug;45(8):899-901. doi: 10.1038/ng.2671. Epub 2013 Jun 16.

Predicting the functional consequences of cancer-associated amino acid substitutions.

Bioinformatics. 2013 Jun 15;29(12):1504-10. doi: 10.1093/bioinformatics/btt182. Epub 2013 Apr 25.

An integrated map of genetic variation from 1,092 human genomes.

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

Exploiting protein-protein interaction networks for genome-wide disease-gene prioritization.

PLoS One. 2012;7(9):e43557. doi: 10.1371/journal.pone.0043557. Epub 2012 Sep 21.

VariBench: a benchmark database for variations.

Hum Mutat. 2013 Jan;34(1):42-9. doi: 10.1002/humu.22204. Epub 2012 Oct 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全外显子组测序研究中非同义单核苷酸变异有害性预测方法的比较与整合

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

作者信息

Dong Chengliang, Wei Peng, Jian Xueqiu, Gibbs Richard, Boerwinkle Eric, Wang Kai, Liu Xiaoming

机构信息

Zilkha Neurogenetic Institute, Biostatistics Division, Department of Preventive Medicine and.

Human Genetics Center, Division of Biostatistics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA and.

出版信息

Hum Mol Genet. 2015 Apr 15;24(8):2125-37. doi: 10.1093/hmg/ddu733. Epub 2014 Dec 30.

DOI:10.1093/hmg/ddu733

PMID:25552646

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4375422/

Abstract

摘要

全外显子组测序研究中非同义单核苷酸变异有害性预测方法的比较与整合

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

全外显子组测序研究中非同义单核苷酸变异有害性预测方法的比较与整合

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

作者信息

机构信息

出版信息