扫描与填充：结合简化基因组测序、SNP芯片和全基因组重测序数据的超密集SNP基因分型

Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.

作者信息

Torkamaneh Davoud, Belzile Francois

机构信息

Département de Phytologie and Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC, Canada.

出版信息

PLoS One. 2015 Jul 10;10(7):e0131533. doi: 10.1371/journal.pone.0131533. eCollection 2015.

DOI:10.1371/journal.pone.0131533

PMID:26161900

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4498655/

Abstract

Genotyping-by-sequencing (GBS) represents a highly cost-effective high-throughput genotyping approach. By nature, however, GBS is subject to generating sizeable amounts of missing data and these will need to be imputed for many downstream analyses. The extent to which such missing data can be tolerated in calling SNPs has not been explored widely. In this work, we first explore the use of imputation to fill in missing genotypes in GBS datasets. Importantly, we use whole genome resequencing data to assess the accuracy of the imputed data. Using a panel of 301 soybean accessions, we show that over 62,000 SNPs could be called when tolerating up to 80% missing data, a five-fold increase over the number called when tolerating up to 20% missing data. At all levels of missing data examined (between 20% and 80%), the resulting SNP datasets were of uniformly high accuracy (96-98%). We then used imputation to combine complementary SNP datasets derived from GBS and a SNP array (SoySNP50K). We thus produced an enhanced dataset of >100,000 SNPs and the genotypes at the previously untyped loci were again imputed with a high level of accuracy (95%). Of the >4,000,000 SNPs identified through resequencing 23 accessions (among the 301 used in the GBS analysis), 1.4 million tag SNPs were used as a reference to impute this large set of SNPs on the entire panel of 301 accessions. These previously untyped loci could be imputed with around 90% accuracy. Finally, we used the 100K SNP dataset (GBS + SoySNP50K) to perform a GWAS on seed oil content within this collection of soybean accessions. Both the number of significant marker-trait associations and the peak significance levels were improved considerably using this enhanced catalog of SNPs relative to a smaller catalog resulting from GBS alone at ≤20% missing data. Our results demonstrate that imputation can be used to fill in both missing genotypes and untyped loci with very high accuracy and that this leads to more powerful genetic analyses.

摘要

基于测序的基因分型（GBS）是一种极具成本效益的高通量基因分型方法。然而，从本质上讲，GBS容易产生大量缺失数据，而在许多下游分析中需要对这些数据进行估算。在单核苷酸多态性（SNP）检测中，此类缺失数据能够被容忍的程度尚未得到广泛研究。在这项工作中，我们首先探索使用估算方法来填补GBS数据集中的缺失基因型。重要的是，我们使用全基因组重测序数据来评估估算数据的准确性。利用一组301份大豆种质，我们发现当容忍高达80%的缺失数据时，可以检测出超过62000个SNP，这一数量是容忍高达20%缺失数据时检测数量的五倍。在所有检测的缺失数据水平（20%至80%）下，所得的SNP数据集均具有一致的高精度（96% - 98%）。然后，我们使用估算方法将源自GBS和SNP芯片（SoySNP50K）的互补SNP数据集进行合并。因此，我们生成了一个超过100000个SNP的增强数据集，并且先前未分型位点的基因型再次被高精度地估算（95%）。在通过对23份种质（在用于GBS分析的301份种质中）进行重测序鉴定出的超过4000000个SNP中，140万个标签SNP被用作参考，以估算整个301份种质群体中的这一大组SNP。这些先前未分型的位点能够以约90%的准确性进行估算。最后，我们使用100K SNP数据集（GBS + SoySNP50K）对该大豆种质群体的种子油含量进行全基因组关联研究（GWAS）。相对于仅使用GBS且缺失数据≤20%时得到的较小数据集，使用这个增强的SNP目录，显著的标记 - 性状关联数量和峰值显著水平都有了相当大的提高。我们的结果表明，估算可用于以非常高的准确性填补缺失基因型和未分型位点，并且这会带来更强大的遗传分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e541/4498655/e27b5afe187c/pone.0131533.g001.jpg

相似文献

Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.

PLoS One. 2015 Jul 10;10(7):e0131533. doi: 10.1371/journal.pone.0131533. eCollection 2015.

Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies.

PLoS One. 2016 Aug 22;11(8):e0161333. doi: 10.1371/journal.pone.0161333. eCollection 2016.

Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation.

BMC Genet. 2017 Apr 5;18(1):32. doi: 10.1186/s12863-017-0501-y.

Fast-GBS: a new pipeline for the efficient and highly accurate calling of SNPs from genotyping-by-sequencing data.

BMC Bioinformatics. 2017 Jan 3;18(1):5. doi: 10.1186/s12859-016-1431-9.

Comprehensive description of genomewide nucleotide and structural variation in short-season soya bean.

Plant Biotechnol J. 2018 Mar;16(3):749-759. doi: 10.1111/pbi.12825. Epub 2017 Nov 3.

Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation.

BMC Genomics. 2016 Aug 24;17(1):676. doi: 10.1186/s12864-016-2966-x.

Genotyping by sequencing for genomic prediction in a soybean breeding population.

BMC Genomics. 2014 Aug 29;15(1):740. doi: 10.1186/1471-2164-15-740.

Imputation accuracy of wheat genotyping-by-sequencing (GBS) data using barley and wheat genome references.

PLoS One. 2019 Jan 7;14(1):e0208614. doi: 10.1371/journal.pone.0208614. eCollection 2019.

An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping.

PLoS One. 2013;8(1):e54603. doi: 10.1371/journal.pone.0054603. Epub 2013 Jan 23.

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy.

BMC Bioinformatics. 2017 Dec 28;18(1):586. doi: 10.1186/s12859-017-2000-6.

引用本文的文献

Genome-wide association analysis of winter survival in a diverse Canadian winter wheat population.

Plant Genome. 2025 Sep;18(3):e70091. doi: 10.1002/tpg2.70091.

Breaking down data silos across companies to train genome-wide predictions: A feasibility study in wheat.

Plant Biotechnol J. 2025 Jul;23(7):2704-2719. doi: 10.1111/pbi.70095. Epub 2025 Apr 20.

Using genotype imputation to integrate Canola populations for genome-wide association and genomic prediction of blackleg resistance.

BMC Genomics. 2025 Mar 4;26(1):215. doi: 10.1186/s12864-025-11250-4.

Integrating targeted genetic markers to genotyping-by-sequencing for an ultimate genotyping tool.

Theor Appl Genet. 2024 Oct 4;137(10):247. doi: 10.1007/s00122-024-04750-6.

Genetic insights into agronomic and morphological traits of drug-type cannabis revealed by genome-wide association studies.

Sci Rep. 2024 Apr 22;14(1):9162. doi: 10.1038/s41598-024-58931-w.

Identification of fusarium head blight resistance markers in a genome-wide association study of CIMMYT spring synthetic hexaploid derived wheat lines.

BMC Plant Biol. 2023 May 31;23(1):290. doi: 10.1186/s12870-023-04306-8.

Restriction site-associated DNA sequencing technologies as an alternative to low-density SNP chips for genomic selection: a simulation study in layer chickens.

BMC Genomics. 2023 May 19;24(1):271. doi: 10.1186/s12864-023-09321-5.

Identification of hub genes regulating isoflavone accumulation in soybean seeds GWAS and WGCNA approaches.

Front Plant Sci. 2023 Feb 14;14:1120498. doi: 10.3389/fpls.2023.1120498. eCollection 2023.

3D-GBS: a universal genotyping-by-sequencing approach for genomic selection and other high-throughput low-cost applications in species with small to medium-sized genomes.

Plant Methods. 2023 Feb 5;19(1):13. doi: 10.1186/s13007-023-00990-7.

Genotyping by Sequencing Advancements in Barley.

Front Plant Sci. 2022 Aug 8;13:931423. doi: 10.3389/fpls.2022.931423. eCollection 2022.

本文引用的文献

Fingerprinting Soybean Germplasm and Its Utility in Genomic Research.

G3 (Bethesda). 2015 Jul 28;5(10):1999-2006. doi: 10.1534/g3.115.019000.

Prospects and limits of marker imputation in quantitative genetic studies in European elite wheat (Triticum aestivum L.).

BMC Genomics. 2015 Mar 11;16(1):168. doi: 10.1186/s12864-015-1366-y.

Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.

Nat Commun. 2014 Jun 13;5:3934. doi: 10.1038/ncomms4934.

Identification of loci governing eight agronomic traits using a GBS-GWAS approach and validation by QTL mapping in soya bean.

Plant Biotechnol J. 2015 Feb;13(2):211-21. doi: 10.1111/pbi.12249. Epub 2014 Sep 12.

Genotyping by sequencing for genomic prediction in a soybean breeding population.

BMC Genomics. 2014 Aug 29;15(1):740. doi: 10.1186/1471-2164-15-740.

Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

Plant J. 2014 Oct;80(1):136-48. doi: 10.1111/tpj.12616. Epub 2014 Sep 3.

Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle.

Nat Genet. 2014 Aug;46(8):858-65. doi: 10.1038/ng.3034. Epub 2014 Jul 13.

The extent of linkage disequilibrium in beef cattle breeds using high-density SNP genotypes.

Genet Sel Evol. 2014 Mar 24;46(1):22. doi: 10.1186/1297-9686-46-22.

Genetic diversity analysis of highly incomplete SNP genotype data with imputations: an empirical assessment.

G3 (Bethesda). 2014 Mar 13;4(5):891-900. doi: 10.1534/g3.114.010942.

Efficient imputation of missing markers in low-coverage genotyping-by-sequencing data from multiparental crosses.

Genetics. 2014 May;197(1):401-4. doi: 10.1534/genetics.113.158014. Epub 2014 Feb 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

扫描与填充：结合简化基因组测序、SNP芯片和全基因组重测序数据的超密集SNP基因分型

Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.

作者信息

Torkamaneh Davoud, Belzile Francois

机构信息

Département de Phytologie and Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC, Canada.

出版信息

PLoS One. 2015 Jul 10;10(7):e0131533. doi: 10.1371/journal.pone.0131533. eCollection 2015.

DOI:10.1371/journal.pone.0131533

PMID:26161900

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4498655/

Abstract

摘要

扫描与填充：结合简化基因组测序、SNP芯片和全基因组重测序数据的超密集SNP基因分型

Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

扫描与填充：结合简化基因组测序、SNP芯片和全基因组重测序数据的超密集SNP基因分型

Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献