• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

繁殖与遗传学研讨会:真正的大数据:超大数据集的处理和分析。

Breeding and Genetics Symposium: really big data: processing and analysis of very large data sets.

机构信息

Animal Improvement Programs Laboratory, ARS, USDA, Beltsville, MD 20705-2350, USA.

出版信息

J Anim Sci. 2012 Mar;90(3):723-33. doi: 10.2527/jas.2011-4584. Epub 2011 Nov 18.

DOI:10.2527/jas.2011-4584
PMID:22100598
Abstract

Modern animal breeding data sets are large and getting larger, due in part to recent availability of high-density SNP arrays and cheap sequencing technology. High-performance computing methods for efficient data warehousing and analysis are under development. Financial and security considerations are important when using shared clusters. Sound software engineering practices are needed, and it is better to use existing solutions when possible. Storage requirements for genotypes are modest, although full-sequence data will require greater storage capacity. Storage requirements for intermediate and results files for genetic evaluations are much greater, particularly when multiple runs must be stored for research and validation studies. The greatest gains in accuracy from genomic selection have been realized for traits of low heritability, and there is increasing interest in new health and management traits. The collection of sufficient phenotypes to produce accurate evaluations may take many years, and high-reliability proofs for older bulls are needed to estimate marker effects. Data mining algorithms applied to large data sets may help identify unexpected relationships in the data, and improved visualization tools will provide insights. Genomic selection using large data requires a lot of computing power, particularly when large fractions of the population are genotyped. Theoretical improvements have made possible the inversion of large numerator relationship matrices, permitted the solving of large systems of equations, and produced fast algorithms for variance component estimation. Recent work shows that single-step approaches combining BLUP with a genomic relationship (G) matrix have similar computational requirements to traditional BLUP, and the limiting factor is the construction and inversion of G for many genotypes. A naïve algorithm for creating G for 14,000 individuals required almost 24 h to run, but custom libraries and parallel computing reduced that to 15 m. Large data sets also create challenges for the delivery of genetic evaluations that must be overcome in a way that does not disrupt the transition from conventional to genomic evaluations. Processing time is important, especially as real-time systems for on-farm decisions are developed. The ultimate value of these systems is to decrease time-to-results in research, increase accuracy in genomic evaluations, and accelerate rates of genetic improvement.

摘要

现代动物育种数据集越来越大,部分原因是最近高密度 SNP 芯片和廉价测序技术的出现。正在开发用于高效数据仓库和分析的高性能计算方法。在使用共享集群时,财务和安全考虑很重要。需要合理的软件工程实践,并且在可能的情况下最好使用现有解决方案。基因型的存储要求适中,尽管全序列数据将需要更大的存储容量。遗传评估的中间文件和结果文件的存储要求要大得多,特别是当需要为研究和验证研究存储多个运行时。基因组选择在准确性方面取得的最大进展是针对低遗传力性状,并且人们对新的健康和管理性状越来越感兴趣。要产生准确的评估,可能需要收集足够的表型多年,并且需要对旧公牛进行高可靠性验证,以估计标记效应。应用于大型数据集的数据挖掘算法可以帮助识别数据中的意外关系,并且改进的可视化工具将提供深入的了解。使用大型数据集进行基因组选择需要大量的计算能力,特别是当大量人群进行基因分型时。理论上的改进使得反转大型分子关系矩阵、解决大型方程组以及产生快速方差分量估计算法成为可能。最近的工作表明,结合 BLUP 和基因组关系 (G) 矩阵的单步方法与传统 BLUP 具有相似的计算要求,限制因素是为许多基因型构建和反转 G。为 14000 个人创建 G 的天真算法运行几乎需要 24 小时,但定制库和并行计算将其减少到 15 分钟。大型数据集也为遗传评估的交付带来了挑战,必须以不破坏从传统到基因组评估的过渡的方式克服这些挑战。处理时间很重要,尤其是在开发实时农场决策系统时。这些系统的最终价值在于减少研究中的结果时间,提高基因组评估的准确性,并加速遗传改进的速度。

相似文献

1
Breeding and Genetics Symposium: really big data: processing and analysis of very large data sets.繁殖与遗传学研讨会:真正的大数据:超大数据集的处理和分析。
J Anim Sci. 2012 Mar;90(3):723-33. doi: 10.2527/jas.2011-4584. Epub 2011 Nov 18.
2
Invited review: Genomic selection in dairy cattle: progress and challenges.特邀综述:奶牛的基因组选择:进展与挑战
J Dairy Sci. 2009 Feb;92(2):433-43. doi: 10.3168/jds.2008-1646.
3
Genomic prediction for Nordic Red Cattle using one-step and selection index blending.使用一步法和选择指数混合对北欧红牛进行基因组预测。
J Dairy Sci. 2012 Feb;95(2):909-17. doi: 10.3168/jds.2011-4804.
4
Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.热门话题:利用表型、全谱系和基因组信息统一方法对荷斯坦综合评分进行遗传评估。
J Dairy Sci. 2010 Feb;93(2):743-52. doi: 10.3168/jds.2009-2730.
5
The genomic evaluation system in the United States: past, present, future.美国的基因组评估系统:过去、现在和未来。
J Dairy Sci. 2011 Jun;94(6):3202-11. doi: 10.3168/jds.2010-3866.
6
Effect of enlarging the reference population with (un)genotyped animals on the accuracy of genomic selection in dairy cattle.扩大(未)基因型动物参考群体对奶牛基因组选择准确性的影响。
J Dairy Sci. 2011 Jan;94(1):431-41. doi: 10.3168/jds.2009-2840.
7
Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle.使用低密度芯片进行基因型推断及其对荷兰荷斯坦奶牛直接基因组值可靠性的影响。
J Dairy Sci. 2012 Feb;95(2):876-89. doi: 10.3168/jds.2011-4490.
8
Genomic breeding value estimation using genetic markers, inferred ancestral haplotypes, and the genomic relationship matrix.使用遗传标记、推断的祖先单倍型和基因组关系矩阵进行基因组育种值估计。
J Dairy Sci. 2011 Sep;94(9):4708-14. doi: 10.3168/jds.2010-3905.
9
Associations of marker panel scores with feed intake and efficiency traits in beef cattle using preselected single nucleotide polymorphisms.利用预选单核苷酸多态性评估标记面板评分与肉牛采食量和效率性状的相关性。
J Anim Sci. 2011 Nov;89(11):3362-71. doi: 10.2527/jas.2010-3362. Epub 2011 Jun 3.
10
Symposium review: Single-step genomic evaluations in dairy cattle.研讨会综述:奶牛单步基因组评估。
J Dairy Sci. 2020 Jun;103(6):5314-5326. doi: 10.3168/jds.2019-17754. Epub 2020 Apr 22.

引用本文的文献

1
The future of phenomics in dairy cattle breeding.奶牛育种中表型组学的未来。
Anim Front. 2020 Apr 1;10(2):37-44. doi: 10.1093/af/vfaa007. eCollection 2020 Apr.
2
A Vision for Development and Utilization of High-Throughput Phenotyping and Big Data Analytics in Livestock.家畜高通量表型分析与大数据分析的发展与利用愿景
Front Genet. 2019 Dec 17;10:1197. doi: 10.3389/fgene.2019.01197. eCollection 2019.
3
Invited review: Big Data in precision dairy farming.特邀综述:精准奶牛养殖中的大数据。
Animal. 2019 Jul;13(7):1519-1528. doi: 10.1017/S1751731118003439. Epub 2019 Jan 11.
4
Toward a Literature-Driven Definition of Big Data in Healthcare.迈向基于文献的医疗大数据定义。
Biomed Res Int. 2015;2015:639021. doi: 10.1155/2015/639021. Epub 2015 Jun 2.
5
Predicting haplotype carriers from SNP genotypes in Bos taurus through linear discriminant analysis.通过线性判别分析从黄牛的单核苷酸多态性(SNP)基因型预测单倍型携带者
Genet Sel Evol. 2015 Feb 5;47(1):4. doi: 10.1186/s12711-015-0094-8.
6
DAIRRy-BLUP: a high-performance computing approach to genomic prediction.乳制品最佳线性无偏预测法(DAIRRy-BLUP):一种用于基因组预测的高性能计算方法。
Genetics. 2014 Jul;197(3):813-22. doi: 10.1534/genetics.114.163683. Epub 2014 Apr 15.