基因组选择实施的基准数据库系统。

Benchmarking database systems for Genomic Selection implementation.

机构信息

Institute of Biotechnology, Cornell University.

Boyce Thompson Institute.

出版信息

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz096.

DOI:10.1093/database/baz096

PMID:31508797

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6737464/

Abstract

MOTIVATION

With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems.

RESULTS

We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix.

AVAILABILITY

http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse.

摘要

动机

随着高通量基因分型系统的出现，将基因分型信息完全整合到育种计划中已成为可能。为了有效利用这些信息，需要具备 DNA 提取设施和标记生产设施，这些设施能够在快速周转时间内高效地在样本中部署所需的标记集，以便在需要进行杂交之前进行选择。实际上，育种者通常只有很短的时间窗口来做出决策，直到他们能够收集所有的表型数据并收到相应的基因分型数据。这给组织信息并在下游分析中利用这些信息来支持育种者做出的决策带来了挑战。为了将基因组选择常规地作为育种计划的一部分实施，人们需要一个高效的基因分型数据存储系统。我们选择并基准测试了六个流行的开源数据存储系统，包括关系型数据库管理系统和列式存储系统。

结果

我们发现，数据提取时间极大地受到系统中基因分型数据存储方向的影响。HDF5 始终表现最佳，部分原因是它可以更有效地处理等位基因矩阵的两种方向。

可用性

http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6e1/6737464/3057b7a7c278/baz096f1.jpg

相似文献

Benchmarking database systems for Genomic Selection implementation.基因组选择实施的基准数据库系统。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz096.

High density genotype storage for plant breeding in the Chado schema of Breedbase.高密度基因型存储在 Breedbase 的 Chado 模式中，用于植物育种。

PLoS One. 2020 Nov 11;15(11):e0240059. doi: 10.1371/journal.pone.0240059. eCollection 2020.

Laboratory Information Management Software for genotyping workflows: applications in high throughput crop genotyping.用于基因分型工作流程的实验室信息管理软件：在高通量作物基因分型中的应用

BMC Bioinformatics. 2006 Aug 17;7:383. doi: 10.1186/1471-2105-7-383.

IGG: A tool to integrate GeneChips for genetic studies.IGG：一种用于整合基因芯片进行遗传学研究的工具。

Bioinformatics. 2007 Nov 15;23(22):3105-7. doi: 10.1093/bioinformatics/btm458. Epub 2007 Sep 14.

Development of genomics-based genotyping platforms and their applications in rice breeding.基于基因组学的基因分型平台的发展及其在水稻育种中的应用。

Curr Opin Plant Biol. 2013 May;16(2):247-54. doi: 10.1016/j.pbi.2013.04.002. Epub 2013 May 21.

Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.使用嵌套包含列表从关系型数据库系统中快速存储和检索基因组区间。

Database (Oxford). 2013 Jul 26;2013:bat056. doi: 10.1093/database/bat056. Print 2013.

Addition of a breeding database in the Genome Database for Rosaceae.在蔷薇科基因组数据库中添加一个繁殖数据库。

Database (Oxford). 2013 Nov 18;2013:bat078. doi: 10.1093/database/bat078. Print 2013.

The impact of selective genotyping on the response to selection using single-step genomic best linear unbiased prediction.利用一步法基因组最佳线性无偏预测进行选择时，选择性基因分型对反应的影响。

J Anim Sci. 2018 Nov 21;96(11):4532-4542. doi: 10.1093/jas/sky330.

JXP4BIGI: a generalized, Java XML-based approach for biological information gathering and integration.JXP4BIGI：一种基于Java XML的通用生物信息收集与整合方法。

Bioinformatics. 2003 Dec 12;19(18):2351-8. doi: 10.1093/bioinformatics/btg327.

Unleashing genotypes in epidemiology - A novel method for managing high throughput information.释放流行病学中的基因型——一种管理高通量信息的新方法。

J Biomed Inform. 2009 Dec;42(6):1029-34. doi: 10.1016/j.jbi.2009.07.005. Epub 2009 Jul 17.

引用本文的文献

A PostgreSQL Tripal solution for large-scale genotypic and phenotypic data.一个用于大规模基因型和表型数据的 PostgreSQL Tripal 解决方案。

Database (Oxford). 2021 Aug 14;2021. doi: 10.1093/database/baab051.

High density genotype storage for plant breeding in the Chado schema of Breedbase.高密度基因型存储在 Breedbase 的 Chado 模式中，用于植物育种。

PLoS One. 2020 Nov 11;15(11):e0240059. doi: 10.1371/journal.pone.0240059. eCollection 2020.

Strategies for Effective Use of Genomic Information in Crop Breeding Programs Serving Africa and South Asia.在服务于非洲和南亚的作物育种计划中有效利用基因组信息的策略。

Front Plant Sci. 2020 Mar 27;11:353. doi: 10.3389/fpls.2020.00353. eCollection 2020.

本文引用的文献

Construction of the third-generation Zea mays haplotype map.第三代玉米单倍型图谱的构建。

Gigascience. 2018 Apr 1;7(4):1-12. doi: 10.1093/gigascience/gix134.

Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery.基因组预测将动物和植物育种计划统一起来，形成生物学发现的平台。

Nat Genet. 2017 Aug 30;49(9):1297-1303. doi: 10.1038/ng.3920.

Evaluation of relational and NoSQL database architectures to manage genomic annotations.用于管理基因组注释的关系型和非关系型数据库架构评估。

J Biomed Inform. 2016 Dec;64:288-295. doi: 10.1016/j.jbi.2016.10.015. Epub 2016 Oct 31.

Gigwa-Genotype investigator for genome-wide analyses.用于全基因组分析的Gigwa基因型研究工具

Gigascience. 2016 Jun 6;5:25. doi: 10.1186/s13742-016-0131-8.

BigQ: a NoSQL based framework to handle genomic variants in i2b2.BigQ：一种基于NoSQL的框架，用于处理i2b2中的基因组变异。

BMC Bioinformatics. 2015 Dec 29;16:415. doi: 10.1186/s12859-015-0861-0.

High dimensional biological data retrieval optimization with NoSQL technology.使用NoSQL技术进行高维生物数据检索优化

BMC Genomics. 2014;15 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2164-15-S8-S3. Epub 2014 Nov 13.

SNP-Seek database of SNPs derived from 3000 rice genomes.来自3000份水稻基因组的单核苷酸多态性（SNP）的SNP-Seek数据库。

Nucleic Acids Res. 2015 Jan;43(Database issue):D1023-7. doi: 10.1093/nar/gku1039. Epub 2014 Nov 27.

Poretools: a toolkit for analyzing nanopore sequence data.Poretools：一个用于分析纳米孔序列数据的工具包。

Bioinformatics. 2014 Dec 1;30(23):3399-401. doi: 10.1093/bioinformatics/btu555. Epub 2014 Aug 20.

TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline.TASSEL-GBS：一种用于测序分析流程的高容量基因分型方法。

PLoS One. 2014 Feb 28;9(2):e90346. doi: 10.1371/journal.pone.0090346. eCollection 2014.

Relax with CouchDB--into the non-relational DBMS era of bioinformatics.放松使用 CouchDB——进入生物信息学的非关系型数据库管理系统时代。

Genomics. 2012 Jul;100(1):1-7. doi: 10.1016/j.ygeno.2012.05.006. Epub 2012 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因组选择实施的基准数据库系统。

Benchmarking database systems for Genomic Selection implementation.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献