Fragoso Christopher A, Heffelfinger Christopher, Zhao Hongyu, Dellaporta Stephen L
Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520 Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520.
Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520.
Genetics. 2016 Feb;202(2):487-95. doi: 10.1534/genetics.115.182071. Epub 2015 Dec 29.
Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute.
低覆盖度的下一代测序方法通常用于对大规模人群进行基因分型。这些人群中的缺失数据表现为缺失标记以及等位基因恢复不完整的标记。等位基因恢复不完整导致杂合位点出现错误的纯合呼叫,这使许多现有的填充算法变得复杂。通过将测序深度覆盖纳入填充算法,可以将这些类型的系统误差降至最低。因此,我们开发了低覆盖度双等位基因填充法(LB-Impute)来解决缺失数据问题。LB-Impute使用一种隐藏马尔可夫模型,该模型纳入标记读取覆盖度以确定可变发射概率。即使在极低的(<1×)平均每个标记覆盖度下,使用LB-Impute也能可靠地获得稳健、高度准确的填充结果。这一发现将对未来基因分型填充算法的设计产生影响。LB-Impute可在GitHub上公开获取,网址为https://github.com/dellaporta-laboratory/LB-Impute。