利用猪全基因组序列数据中的预选标记进行多线ssGBLUP评估。

Multi-line ssGBLUP evaluation using preselected markers from whole-genome sequence data in pigs.

作者信息

Jang Sungbong, Ros-Freixedes Roger, Hickey John M, Chen Ching-Yi, Herring William O, Holl Justin, Misztal Ignacy, Lourenco Daniela

机构信息

Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States.

Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain.

出版信息

Front Genet. 2023 May 12;14:1163626. doi: 10.3389/fgene.2023.1163626. eCollection 2023.

DOI:10.3389/fgene.2023.1163626

PMID:37252662

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10213539/

Abstract

Genomic evaluations in pigs could benefit from using multi-line data along with whole-genome sequencing (WGS) if the data are large enough to represent the variability across populations. The objective of this study was to investigate strategies to combine large-scale data from different terminal pig lines in a multi-line genomic evaluation (MLE) through single-step GBLUP (ssGBLUP) models while including variants preselected from whole-genome sequence (WGS) data. We investigated single-line and multi-line evaluations for five traits recorded in three terminal lines. The number of sequenced animals in each line ranged from 731 to 1,865, with 60k to 104k imputed to WGS. Unknown parent groups (UPG) and metafounders (MF) were explored to account for genetic differences among the lines and improve the compatibility between pedigree and genomic relationships in the MLE. Sequence variants were preselected based on multi-line genome-wide association studies (GWAS) or linkage disequilibrium (LD) pruning. These preselected variant sets were used for ssGBLUP predictions without and with weights from BayesR, and the performances were compared to that of a commercial porcine single-nucleotide polymorphisms (SNP) chip. Using UPG and MF in MLE showed small to no gain in prediction accuracy (up to 0.02), depending on the lines and traits, compared to the single-line genomic evaluation (SLE). Likewise, adding selected variants from the GWAS to the commercial SNP chip resulted in a maximum increase of 0.02 in the prediction accuracy, only for average daily feed intake in the most numerous lines. In addition, no benefits were observed when using preselected sequence variants in multi-line genomic predictions. Weights from BayesR did not help improve the performance of ssGBLUP. This study revealed limited benefits of using preselected whole-genome sequence variants for multi-line genomic predictions, even when tens of thousands of animals had imputed sequence data. Correctly accounting for line differences with UPG or MF in MLE is essential to obtain predictions similar to SLE; however, the only observed benefit of an MLE is to have comparable predictions across lines. Further investigation into the amount of data and novel methods to preselect whole-genome causative variants in combined populations would be of significant interest.

摘要

如果数据量足够大以代表群体间的变异性，猪的基因组评估可以从使用多系数据以及全基因组测序（WGS）中受益。本研究的目的是研究通过单步GBLUP（ssGBLUP）模型，在多系基因组评估（MLE）中结合来自不同终端猪系的大规模数据的策略，同时纳入从全基因组序列（WGS）数据中预先选择的变异。我们研究了对三个终端系记录的五个性状进行的单系和多系评估。每个系中测序动物的数量从731到1865不等，有60k到104k个被推算到WGS。探索了未知亲本组（UPG）和元奠基者（MF）以考虑系间的遗传差异，并改善MLE中系谱和基因组关系之间的兼容性。基于多系全基因组关联研究（GWAS）或连锁不平衡（LD）修剪预先选择序列变异。这些预先选择的变异集用于ssGBLUP预测，有无来自BayesR的权重，并将性能与商业猪单核苷酸多态性（SNP）芯片进行比较。在MLE中使用UPG和MF显示，与单系基因组评估（SLE）相比，预测准确性的提高很小或没有提高（最高0.02），这取决于系和性状。同样，将来自GWAS的选定变异添加到商业SNP芯片中，仅在数量最多的系中对平均日采食量而言，预测准确性最多提高了0.02。此外，在多系基因组预测中使用预先选择的序列变异未观察到益处。来自BayesR的权重无助于提高ssGBLUP的性能。本研究表明，即使有成千上万的动物具有推算的序列数据，使用预先选择的全基因组序列变异进行多系基因组预测的益处也有限。在MLE中使用UPG或MF正确考虑系间差异对于获得与SLE相似的预测至关重要；然而，观察到的MLE的唯一益处是在各系间具有可比的预测。进一步研究合并群体中数据的数量和预先选择全基因组致病变异的新方法将具有重要意义。