利用低覆盖度序列数据推算双等位基因群体中的基因型

Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data.

作者信息

Fragoso Christopher A, Heffelfinger Christopher, Zhao Hongyu, Dellaporta Stephen L

机构信息

Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520 Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520.

Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520.

出版信息

Genetics. 2016 Feb;202(2):487-95. doi: 10.1534/genetics.115.182071. Epub 2015 Dec 29.

DOI:10.1534/genetics.115.182071

PMID:26715670

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4788230/

Abstract

Low-coverage next-generation sequencing methodologies are routinely employed to genotype large populations. Missing data in these populations manifest both as missing markers and markers with incomplete allele recovery. False homozygous calls at heterozygous sites resulting from incomplete allele recovery confound many existing imputation algorithms. These types of systematic errors can be minimized by incorporating depth-of-sequencing read coverage into the imputation algorithm. Accordingly, we developed Low-Coverage Biallelic Impute (LB-Impute) to resolve missing data issues. LB-Impute uses a hidden Markov model that incorporates marker read coverage to determine variable emission probabilities. Robust, highly accurate imputation results were reliably obtained with LB-Impute, even at extremely low (<1×) average per-marker coverage. This finding will have implications for the design of genotype imputation algorithms in the future. LB-Impute is publicly available on GitHub at https://github.com/dellaporta-laboratory/LB-Impute.

摘要

低覆盖度的下一代测序方法通常用于对大规模人群进行基因分型。这些人群中的缺失数据表现为缺失标记以及等位基因恢复不完整的标记。等位基因恢复不完整导致杂合位点出现错误的纯合呼叫，这使许多现有的填充算法变得复杂。通过将测序深度覆盖纳入填充算法，可以将这些类型的系统误差降至最低。因此，我们开发了低覆盖度双等位基因填充法（LB-Impute）来解决缺失数据问题。LB-Impute使用一种隐藏马尔可夫模型，该模型纳入标记读取覆盖度以确定可变发射概率。即使在极低的（<1×）平均每个标记覆盖度下，使用LB-Impute也能可靠地获得稳健、高度准确的填充结果。这一发现将对未来基因分型填充算法的设计产生影响。LB-Impute可在GitHub上公开获取，网址为https://github.com/dellaporta-laboratory/LB-Impute。

相似文献

Imputing Genotypes in Biallelic Populations from Low-Coverage Sequence Data.利用低覆盖度序列数据推算双等位基因群体中的基因型

Genetics. 2016 Feb;202(2):487-95. doi: 10.1534/genetics.115.182071. Epub 2015 Dec 29.

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.评估低深度简化基因组测序（GBS）数据的插补算法

PLoS One. 2016 Aug 18;11(8):e0160733. doi: 10.1371/journal.pone.0160733. eCollection 2016.

Fast imputation using medium or low-coverage sequence data.使用中等或低覆盖率序列数据进行快速插补。

BMC Genet. 2015 Jul 14;16:82. doi: 10.1186/s12863-015-0243-7.

Accurate Genotype Imputation in Multiparental Populations from Low-Coverage Sequence.多亲本群体低覆盖度序列下的精确基因型推断。

Genetics. 2018 Sep;210(1):71-82. doi: 10.1534/genetics.118.300885. Epub 2018 Jul 25.

IMPUTOR: Phylogenetically Aware Software for Imputation of Errors in Next-Generation Sequencing.IMPUTOR：用于下一代测序中错误推断的系统发育感知软件。

Genome Biol Evol. 2018 Apr 1;10(5):1248-1254. doi: 10.1093/gbe/evy088.

Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations.利用双交群体构建遗传图谱时，考虑低覆盖率高通量测序数据中的错误。

Genetics. 2018 May;209(1):65-76. doi: 10.1534/genetics.117.300627. Epub 2018 Feb 27.

Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans.插补方法对大豆单核苷酸多态性面板捕获的遗传变异量的影响。

BMC Bioinformatics. 2016 Feb 2;17:55. doi: 10.1186/s12859-016-0899-7.

A comprehensive evaluation of SNP genotype imputation.单核苷酸多态性（SNP）基因型填充的综合评估。

Hum Genet. 2009 Mar;125(2):163-71. doi: 10.1007/s00439-008-0606-5. Epub 2008 Dec 17.

Molgenis-impute: imputation pipeline in a box.Molgenis-impute：一体化的插补流程。

BMC Res Notes. 2015 Aug 19;8:359. doi: 10.1186/s13104-015-1309-3.

Hidden Markov Models in Bioinformatics: SNV Inference from Next Generation Sequence.生物信息学中的隐马尔可夫模型：从下一代测序中推断单核苷酸变异

Methods Mol Biol. 2017;1552:123-133. doi: 10.1007/978-1-4939-6753-7_9.

引用本文的文献

Evaluation of Low-Coverage Sequencing Strategies for Whole-Genome Imputation in Pacific Abalone .太平洋鲍鱼全基因组插补的低覆盖度测序策略评估

Int J Mol Sci. 2025 May 11;26(10):4598. doi: 10.3390/ijms26104598.

A genome-wide association study using Myanmar diversity panel reveals a significant genomic region associated with heading date in rice.一项利用缅甸多样性面板进行的全基因组关联研究揭示了一个与水稻抽穗期相关的重要基因组区域。

Breed Sci. 2024 Dec;74(5):415-426. doi: 10.1270/jsbbs.23083. Epub 2024 Dec 4.

Fast and accurate imputation of genotypes from noisy low-coverage sequencing data in bi-parental populations.在双亲群体中从有噪声的低覆盖度测序数据快速准确地推断基因型。

PLoS One. 2025 Jan 30;20(1):e0314759. doi: 10.1371/journal.pone.0314759. eCollection 2025.

Rapid change in red cell blood group systems after the main Out of Africa of Homo sapiens.智人主要走出非洲后红细胞血型系统的快速变化。

Sci Rep. 2025 Jan 23;15(1):1597. doi: 10.1038/s41598-024-83023-0.

Single- and multiple-trait quantitative trait locus analyses for seed oil and protein contents of soybean populations with advanced breeding line background.具有先进育种系背景的大豆群体种子油和蛋白质含量的单性状和多性状数量性状位点分析。

Mol Breed. 2024 Aug 7;44(8):51. doi: 10.1007/s11032-024-01489-2. eCollection 2024 Aug.

Genotyping of SNPs in bread wheat at reduced cost from pooled experiments and imputation.从汇集实验和插补中降低成本对面包小麦的 SNPs 进行基因分型。

Theor Appl Genet. 2024 Jan 19;137(1):26. doi: 10.1007/s00122-023-04533-5.

Integration of genetic and genomics resources in einkorn wheat enables precision mapping of important traits.遗传和基因组资源在单粒小麦中的整合使重要性状的精确图谱绘制成为可能。

Commun Biol. 2023 Aug 12;6(1):835. doi: 10.1038/s42003-023-05189-z.

Whole-genome analysis of recombinant inbred rice lines reveals a quantitative trait locus on chromosome 3 with genotype-by-environment interaction effects.全基因组分析重组自交系水稻品系揭示了 3 号染色体上一个与基因型-环境互作有关的数量性状位点。

G3 (Bethesda). 2023 Jun 1;13(6). doi: 10.1093/g3journal/jkad082.

GBScleanR: robust genotyping error correction using a hidden Markov model with error pattern recognition.GBScleanR：使用具有错误模式识别的隐马尔可夫模型进行稳健的基因分型错误校正。

Genetics. 2023 May 26;224(2). doi: 10.1093/genetics/iyad055.

Reliable genotyping of recombinant genomes using a robust hidden Markov model.利用稳健的隐马尔可夫模型进行重组基因组的可靠基因分型。

Plant Physiol. 2023 May 31;192(2):821-836. doi: 10.1093/plphys/kiad191.

本文引用的文献

Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping.用于交叉定位和精细遗传图谱绘制的快速且经济的全基因组测序基因分型法

G3 (Bethesda). 2015 Jan 13;5(3):385-98. doi: 10.1534/g3.114.016501.

Flexible and scalable genotyping-by-sequencing strategies for population studies.用于群体研究的灵活且可扩展的测序基因分型策略。

BMC Genomics. 2014 Nov 18;15(1):979. doi: 10.1186/1471-2164-15-979.

minimac2: faster genotype imputation.Minimac2：更快的基因型填充。

Bioinformatics. 2015 Mar 1;31(5):782-4. doi: 10.1093/bioinformatics/btu704. Epub 2014 Oct 22.

Efficient imputation of missing markers in low-coverage genotyping-by-sequencing data from multiparental crosses.多亲本杂交低覆盖度测序基因分型数据中缺失标记的高效填充

Genetics. 2014 May;197(1):401-4. doi: 10.1534/genetics.113.158014. Epub 2014 Feb 28.

Genotype imputation via matrix completion.基于矩阵补全的基因型推断。

Genome Res. 2013 Mar;23(3):509-18. doi: 10.1101/gr.145821.112. Epub 2012 Dec 10.

An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

MaCH-admix: genotype imputation for admixed populations.MaCH-admix：混合人群的基因型推断。

Genet Epidemiol. 2013 Jan;37(1):25-37. doi: 10.1002/gepi.21690. Epub 2012 Oct 16.

Exome sequencing as a tool for Mendelian disease gene discovery.外显子组测序作为孟德尔疾病基因发现的工具。

Nat Rev Genet. 2011 Sep 27;12(11):745-55. doi: 10.1038/nrg3031.

Haplotype phasing: existing methods and new developments.单体型相位确定：现有方法和新进展。

Nat Rev Genet. 2011 Sep 16;12(10):703-14. doi: 10.1038/nrg3054.

Genotype imputation for the prediction of genomic breeding values in non-genotyped and low-density genotyped individuals.用于预测非基因分型和低密度基因分型个体基因组育种值的基因型填充

BMC Proc. 2011 May 27;5 Suppl 3(Suppl 3):S6. doi: 10.1186/1753-6561-5-S3-S6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验