Wei Qiang, Zhan Xiaowei, Zhong Xue, Liu Yongzhuang, Han Yujun, Chen Wei, Li Bingshan
Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA.
Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA,Center for Human Genetic Variation, Duke University, Durham, NC, USA, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA.
Bioinformatics. 2015 May 1;31(9):1375-81. doi: 10.1093/bioinformatics/btu839. Epub 2014 Dec 21.
Spontaneous (de novo) mutations play an important role in the disease etiology of a range of complex diseases. Identifying de novo mutations (DNMs) in sporadic cases provides an effective strategy to find genes or genomic regions implicated in the genetics of disease. High-throughput next-generation sequencing enables genome- or exome-wide detection of DNMs by sequencing parents-proband trios. It is challenging to sift true mutations through massive amount of noise due to sequencing error and alignment artifacts. One of the critical limitations of existing methods is that for all genomic regions the same pre-specified mutation rate is assumed, which has a significant impact on the DNM calling accuracy.
In this study, we developed and implemented a novel Bayesian framework for DNM calling in trios (TrioDeNovo), which overcomes these limitations by disentangling prior mutation rates from evaluation of the likelihood of the data so that flexible priors can be adjusted post-hoc at different genomic sites. Through extensively simulations and application to real data we showed that this new method has improved sensitivity and specificity over existing methods, and provides a flexible framework to further improve the efficiency by incorporating proper priors. The accuracy is further improved using effective filtering based on sequence alignment characteristics.
The C++ source code implementing TrioDeNovo is freely available at https://medschool.vanderbilt.edu/cgg.
Supplementary data are available at Bioinformatics online.
自发(新生)突变在一系列复杂疾病的病因中起着重要作用。在散发病例中识别新生突变(DNM)为寻找与疾病遗传学相关的基因或基因组区域提供了一种有效策略。高通量下一代测序能够通过对父母-先证者三联体进行测序,在全基因组或外显子组范围内检测DNM。由于测序错误和比对假象,要从大量噪声中筛选出真正的突变具有挑战性。现有方法的一个关键局限性在于,对于所有基因组区域都假设相同的预先设定的突变率,这对DNM的识别准确性有重大影响。
在本研究中,我们开发并实施了一种用于三联体中DNM识别的新型贝叶斯框架(TrioDeNovo),该框架通过将先验突变率与数据似然性评估分开,克服了这些局限性,从而可以在不同基因组位点事后调整灵活的先验。通过广泛的模拟以及对实际数据的应用,我们表明这种新方法比现有方法具有更高的敏感性和特异性,并提供了一个灵活的框架,通过纳入适当的先验来进一步提高效率。基于序列比对特征进行有效过滤可进一步提高准确性。
实现TrioDeNovo的C++源代码可在https://medschool.vanderbilt.edu/cgg免费获取。
补充数据可在《生物信息学》在线获取。