Suppr超能文献

使用高效新模拟器评估基于序列的基因组预测

Evaluating Sequence-Based Genomic Prediction with an Efficient New Simulator.

作者信息

Pérez-Enciso Miguel, Forneris Natalia, de Los Campos Gustavo, Legarra Andrés

机构信息

Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB Consortium, 08193 Bellaterra, Barcelona, Spain

Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain.

出版信息

Genetics. 2017 Feb;205(2):939-953. doi: 10.1534/genetics.116.194878. Epub 2016 Dec 2.

Abstract

The vast amount of sequence data generated to analyze complex traits is posing new challenges in terms of the analysis and interpretation of the results. Although simulation is a fundamental tool to investigate the reliability of genomic analyses and to optimize experimental design, existing software cannot realistically simulate complete genomes. To remedy this, we have developed a new strategy (Sequence-Based Virtual Breeding, SBVB) that uses real sequence data and simulates new offspring genomes and phenotypes in a very efficient and flexible manner. Using this tool, we studied the efficiency of full sequence in genomic prediction compared to SNP arrays. We used real porcine sequences from three breeds as founder genomes of a 2500-animal pedigree and two genetic architectures: "neutral" and "selective." In the neutral architecture, frequencies and allele effects were sampled independently whereas, in the selective case, SNPs were sites putatively under selection after domestication and a negative correlation between effect and frequency was induced. We compared the effectiveness of different genotyping strategies for genomic selection, including the use of full sequence commercial arrays or randomly chosen SNP sets in both outbred and crossbred experimental designs. We found that accuracy increases using sequence instead of commercial chips but modestly, perhaps by ≤ 4%. This result was robust to extreme genetic architectures. We conclude that full sequence is unlikely to offset commercial arrays for predicting genetic value when the number of loci is relatively large and the prior given to each SNP is uniform. Using sequence to improve selection thus requires optimized prior information and, likely, increased population sizes. The code and manual for SBVB are available at https://github.com/mperezenciso/sbvb0.

摘要

为分析复杂性状而生成的大量序列数据在结果分析和解释方面带来了新挑战。尽管模拟是研究基因组分析可靠性和优化实验设计的基本工具,但现有软件无法逼真地模拟完整基因组。为弥补这一不足,我们开发了一种新策略(基于序列的虚拟育种,SBVB),该策略使用真实序列数据,以非常高效且灵活的方式模拟新的后代基因组和表型。使用这个工具,我们研究了与单核苷酸多态性(SNP)阵列相比,全序列在基因组预测中的效率。我们使用来自三个品种的真实猪序列作为一个2500头动物家系的创始基因组,并采用两种遗传结构:“中性”和“选择性”。在中性结构中,频率和等位基因效应是独立采样的,而在选择性情况下,单核苷酸多态性位点是驯化后假定受到选择的位点,并诱导效应和频率之间呈负相关。我们比较了不同基因分型策略在基因组选择中的有效性,包括在远交和杂交实验设计中使用全序列商业阵列或随机选择的单核苷酸多态性集合。我们发现使用序列而非商业芯片时准确性会提高,但幅度不大,可能≤4%。这一结果对于极端遗传结构具有稳健性。我们得出结论,当基因座数量相对较大且赋予每个单核苷酸多态性的先验信息相同时,全序列不太可能在预测遗传价值方面取代商业阵列。因此,使用序列来改进选择需要优化的先验信息,并且可能需要增加群体规模。SBVB的代码和手册可在https://github.com/mperezenciso/sbvb0获取。

相似文献

引用本文的文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验