GBS-SNP-CROP：一种用于单核苷酸多态性（SNP）发现和植物种质特征分析的无参考序列流程，使用可变长度的双端测序基因分型数据。

GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data.

作者信息

Melo Arthur T O, Bartaula Radhika, Hale Iago

机构信息

College of Life Sciences and Agriculture, Department of Biological Sciences, University of New Hampshire, Durham, NH, USA.

College of Life Sciences and Agriculture, Genetics Graduate Program, University of New Hampshire, Durham, NH, USA.

出版信息

BMC Bioinformatics. 2016 Jan 12;17:29. doi: 10.1186/s12859-016-0879-y.

DOI:10.1186/s12859-016-0879-y

PMID:26754002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4709900/

Abstract

BACKGROUND

With its simple library preparation and robust approach to genome reduction, genotyping-by-sequencing (GBS) is a flexible and cost-effective strategy for SNP discovery and genotyping, provided an appropriate reference genome is available. For resource-limited curation, research, and breeding programs of underutilized plant genetic resources, however, even low-depth references may not be within reach, despite declining sequencing costs. Such programs would find value in an open-source bioinformatics pipeline that can maximize GBS data usage and perform high-density SNP genotyping in the absence of a reference.

RESULTS

The GBS SNP-Calling Reference Optional Pipeline (GBS-SNP-CROP) developed and presented here adopts a clustering strategy to build a population-tailored "Mock Reference" from the same GBS data used for downstream SNP calling and genotyping. Designed for libraries of paired-end (PE) reads, GBS-SNP-CROP maximizes data usage by eliminating unnecessary data culling due to imposed read-length uniformity requirements. Using 150 bp PE reads from a GBS library of 48 accessions of tetraploid kiwiberry (Actinidia arguta), GBS-SNP-CROP yielded on average three times as many SNPs as TASSEL-GBS analyses (32 and 64 bp tag lengths) and over 18 times as many as TASSEL-UNEAK, with fewer genotyping errors in all cases, as evidenced by comparing the genotypic characterizations of biological replicates. Using the published reference genome of a related diploid species (A. chinensis), the reference-based version of GBS-SNP-CROP behaved similarly to TASSEL-GBS in terms of the number of SNPs called but had an improved read depth distribution and fewer genotyping errors. Our results also indicate that the sets of SNPs detected by the different pipelines above are largely orthogonal to one another; thus GBS-SNP-CROP may be used to augment the results of alternative analyses, whether or not a reference is available.

CONCLUSIONS

By achieving high-density SNP genotyping in populations for which no reference genome is available, GBS-SNP-CROP is worth consideration by curators, researchers, and breeders of under-researched plant genetic resources. In cases where a reference is available, especially if from a related species or when the target population is particularly diverse, GBS-SNP-CROP may complement other reference-based pipelines by extracting more information per sequencing dollar spent. The current version of GBS-SNP-CROP is available at https://github.com/halelab/GBS-SNP-CROP.git.

摘要

背景

基于测序的基因分型（GBS）具有简单的文库制备方法和强大的基因组简化策略，是一种灵活且经济高效的SNP发现和基因分型策略，前提是有合适的参考基因组。然而，对于未充分利用的植物遗传资源的资源有限的管理、研究和育种计划而言，尽管测序成本不断下降，但即使是低深度的参考基因组可能也无法获得。这样的计划会从一个开源生物信息学流程中找到价值，该流程可以最大限度地利用GBS数据，并在没有参考基因组的情况下进行高密度SNP基因分型。

结果

本文开发并展示的GBS SNP分型参考可选流程（GBS-SNP-CROP）采用聚类策略，从用于下游SNP分型和基因分型的相同GBS数据中构建一个针对群体定制的“模拟参考基因组”。GBS-SNP-CROP专为双端（PE） reads文库设计，通过消除因强制要求读长一致性而导致的不必要数据剔除，最大限度地提高了数据利用率。使用来自48个四倍体猕猴桃（软枣猕猴桃）GBS文库的150 bp PE reads，GBS-SNP-CROP产生的SNP数量平均是TASSEL-GBS分析（标签长度为32和64 bp）的三倍，是TASSEL-UNEAK的18倍多，并且在所有情况下基因分型错误都更少，通过比较生物学重复的基因型特征可以证明这一点。使用相关二倍体物种（中华猕猴桃）已发表的参考基因组，基于参考基因组的GBS-SNP-CROP版本在调用的SNP数量方面与TASSEL-GBS表现相似，但具有更好的读深度分布和更少的基因分型错误。我们的结果还表明，上述不同流程检测到的SNP集在很大程度上彼此正交；因此，无论是否有参考基因组，GBS-SNP-CROP都可用于增强其他分析结果。

结论

通过在没有参考基因组的群体中实现高密度SNP基因分型，GBS-SNP-CROP值得未充分研究的植物遗传资源的管理者、研究人员和育种者考虑。在有参考基因组的情况下，特别是如果参考基因组来自相关物种或目标群体特别多样化时，GBS-SNP-CROP可以通过每花费一美元测序提取更多信息来补充其他基于参考基因组的流程。GBS-SNP-CROP的当前版本可在https://github.com/halelab/GBS-SNP-CROP.git获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/226b/4709900/c17b7bd6ed5d/12859_2016_879_Fig1_HTML.jpg

相似文献

GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data.

BMC Bioinformatics. 2016 Jan 12;17:29. doi: 10.1186/s12859-016-0879-y.

UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study.

BMC Plant Biol. 2018 Jun 15;18(1):117. doi: 10.1186/s12870-018-1316-3.

Genome-Wide SNP Calling from Genotyping by Sequencing (GBS) Data: A Comparison of Seven Pipelines and Two Sequencing Technologies.

PLoS One. 2016 Aug 22;11(8):e0161333. doi: 10.1371/journal.pone.0161333. eCollection 2016.

Expanded functionality, increased accuracy, and enhanced speed in the de novo genotyping-by-sequencing pipeline GBS-SNP-CROP.

Bioinformatics. 2019 May 15;35(10):1783-1785. doi: 10.1093/bioinformatics/bty873.

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy.

BMC Bioinformatics. 2017 Dec 28;18(1):586. doi: 10.1186/s12859-017-2000-6.

Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation.

BMC Genet. 2017 Apr 5;18(1):32. doi: 10.1186/s12863-017-0501-y.

Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing.

PLoS One. 2015 Jun 26;10(6):e0131918. doi: 10.1371/journal.pone.0131918. eCollection 2015.

An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping.

PLoS One. 2013;8(1):e54603. doi: 10.1371/journal.pone.0054603. Epub 2013 Jan 23.

TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline.

PLoS One. 2014 Feb 28;9(2):e90346. doi: 10.1371/journal.pone.0090346. eCollection 2014.

Fast-GBS: a new pipeline for the efficient and highly accurate calling of SNPs from genotyping-by-sequencing data.

BMC Bioinformatics. 2017 Jan 3;18(1):5. doi: 10.1186/s12859-016-1431-9.

引用本文的文献

Identification of three novel QTL for resistance to highly aggressive Canadian strains of in rutabaga cultivar ECD10.

Front Plant Sci. 2025 Jun 30;16:1588460. doi: 10.3389/fpls.2025.1588460. eCollection 2025.

The integration of quantitative trait locus mapping and transcriptome studies reveals candidate genes for water stress response in St. Augustinegrass.

BMC Plant Biol. 2025 May 19;25(1):662. doi: 10.1186/s12870-025-06692-7.

Historical data provide new insights into inheritance of traits important for diploid potato breeding.

Planta. 2025 Feb 27;261(4):69. doi: 10.1007/s00425-025-04618-z.

Combining genotyping approaches improves resolution for association mapping: a case study in tropical maize under water stress conditions.

Front Plant Sci. 2025 Jan 23;15:1442008. doi: 10.3389/fpls.2024.1442008. eCollection 2024.

Fine-tuning GBS data with comparison of reference and mock genome approaches for advancing genomic selection in less studied farmed species.

BMC Genomics. 2025 Feb 5;26(1):111. doi: 10.1186/s12864-025-11296-4.

Genetic analysis of yield components in buckwheat using high-throughput sequencing analysis and wild resource populations.

Physiol Mol Biol Plants. 2024 Aug;30(8):1313-1328. doi: 10.1007/s12298-024-01491-0. Epub 2024 Jul 22.

Genetic diversity analysis of big-bracted dogwood (Cornus florida and C. kousa) cultivars, interspecific hybrids, and wild-collected accessions using RADseq.

PLoS One. 2024 Jul 25;19(7):e0307326. doi: 10.1371/journal.pone.0307326. eCollection 2024.

Genotyping-by-sequencing targets genic regions and improves resolution of genome-wide association studies in autotetraploid potato.

Theor Appl Genet. 2024 Jul 9;137(8):180. doi: 10.1007/s00122-024-04651-8.

A genomic dataset integrating genotyping-by-sequencing, SolCAP array and PCR marker data on tetraploid potato advanced breeding lines.

Front Plant Sci. 2024 May 17;15:1384401. doi: 10.3389/fpls.2024.1384401. eCollection 2024.

Characterisation and mapping of a Globodera pallida resistance derived from the wild potato species Solanum spegazzinii.

Theor Appl Genet. 2024 Apr 16;137(5):106. doi: 10.1007/s00122-024-04605-0.

本文引用的文献

Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm.

Hortic Res. 2015 Jan 14;2:14065. doi: 10.1038/hortres.2014.65. eCollection 2015.

Classification and characterization of species within the genus lens using genotyping-by-sequencing (GBS).

PLoS One. 2015 Mar 27;10(3):e0122025. doi: 10.1371/journal.pone.0122025. eCollection 2015.

Accuracy of Next Generation Sequencing Platforms.

Next Gener Seq Appl. 2014;1. doi: 10.4172/jngsa.1000106.

Assessment of genetic variation within a global collection of lentil (Lens culinaris Medik.) cultivars and landraces using SNP markers.

BMC Genet. 2014 Dec 24;15:150. doi: 10.1186/s12863-014-0150-3.

Estimating genotype error rates from high-coverage next-generation sequence data.

Genome Res. 2014 Nov;24(11):1734-9. doi: 10.1101/gr.168393.113. Epub 2014 Oct 10.

Reference-free SNP detection: dealing with the data deluge.

BMC Genomics. 2014;15 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2164-15-S4-S10. Epub 2014 May 20.

Identification of pummelo cultivars by using a panel of 25 selected SNPs and 12 DNA segments.

PLoS One. 2014 Apr 14;9(4):e94506. doi: 10.1371/journal.pone.0094506. eCollection 2014.

Trimmomatic: a flexible trimmer for Illumina sequence data.

Bioinformatics. 2014 Aug 1;30(15):2114-20. doi: 10.1093/bioinformatics/btu170. Epub 2014 Apr 1.

TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline.

PLoS One. 2014 Feb 28;9(2):e90346. doi: 10.1371/journal.pone.0090346. eCollection 2014.

An extensive evaluation of read trimming effects on Illumina NGS data analysis.

PLoS One. 2013 Dec 23;8(12):e85024. doi: 10.1371/journal.pone.0085024. eCollection 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GBS-SNP-CROP：一种用于单核苷酸多态性（SNP）发现和植物种质特征分析的无参考序列流程，使用可变长度的双端测序基因分型数据。

GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献