Suppr超能文献

基于插补全基因组序列数据的单核苷酸多态性功能注释在韩牛中的基因组预测

Genomic Prediction Based on SNP Functional Annotation Using Imputed Whole-Genome Sequence Data in Korean Hanwoo Cattle.

作者信息

Lopez Bryan Irvine M, An Narae, Srikanth Krishnamoorthy, Lee Seunghwan, Oh Jae-Don, Shin Dong-Hyun, Park Woncheoul, Chai Han-Ha, Park Jong-Eun, Lim Dajeong

机构信息

Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, Rural Development Administration, Wanju, South Korea.

Department of Animal Science and Biotechnology, Chungnam National University, Daejeon, South Korea.

出版信息

Front Genet. 2021 Jan 21;11:603822. doi: 10.3389/fgene.2020.603822. eCollection 2020.

Abstract

Whole-genome sequence (WGS) data are increasingly being applied into genomic predictions, offering a higher predictive ability by including causal mutations or single-nucleotide polymorphisms (SNPs) putatively in strong linkage disequilibrium with causal mutations affecting the trait. This study aimed to improve the predictive performance of the customized Hanwoo 50 k SNP panel for four carcass traits in commercial Hanwoo population by adding highly predictive variants from sequence data. A total of 16,892 Hanwoo cattle with phenotypes (i.e., backfat thickness, carcass weight, longissimus muscle area, and marbling score), 50 k genotypes, and WGS imputed genotypes were used. We partitioned imputed WGS data according to functional annotation [intergenic (IGR), intron (ITR), regulatory (REG), synonymous (SYN), and non-synonymous (NSY)] to characterize the genomic regions that will deliver higher predictive power for the traits investigated. Animals were assigned into two groups, the discovery set (7324 animals) used for predictive variant detection and the cross-validation set for genomic prediction. Genome-wide association studies were performed by trait to every genomic region and entire WGS data for the pre-selection of variants. Each set of pre-selected SNPs with different density (1000, 3000, 5000, or 10,000) were added to the 50 k genotypes separately and the predictive performance of each set of genotypes was assessed using the genomic best linear unbiased prediction (GBLUP). Results showed that the predictive performance of the customized Hanwoo 50 k SNP panel can be improved by the addition of pre-selected variants from the WGS data, particularly 3000 variants from each trait, which is then sufficient to improve the prediction accuracy for all traits. When 12,000 pre-selected variants (3000 variants from each trait) were added to the 50 k genotypes, the prediction accuracies increased by 9.9, 9.2, 6.4, and 4.7% for backfat thickness, carcass weight, longissimus muscle area, and marbling score compared to the regular 50 k SNP panel, respectively. In terms of prediction bias, regression coefficients for all sets of genotypes in all traits were close to 1, indicating an unbiased prediction. The strategy used to select variants based on functional annotation did not show a clear advantage compared to using whole-genome. Nonetheless, such pre-selected SNPs from the IGR region gave the highest improvement in prediction accuracy among genomic regions and the values were close to those obtained using the WGS data for all traits. We concluded that additional gain in prediction accuracy when using pre-selected variants appears to be trait-dependent, and using WGS data remained more accurate compared to using a specific genomic region.

摘要

全基因组序列(WGS)数据越来越多地应用于基因组预测,通过纳入因果突变或与影响该性状的因果突变处于强连锁不平衡状态的单核苷酸多态性(SNP),从而提供更高的预测能力。本研究旨在通过添加来自序列数据的高预测性变异,提高定制的韩牛50k SNP芯片对商业韩牛群体四个胴体性状的预测性能。共使用了16892头具有表型(即背膘厚度、胴体重量、背最长肌面积和大理石花纹评分)、50k基因型和WGS推算基因型的韩牛。我们根据功能注释[基因间区域(IGR)、内含子(ITR)、调控区域(REG)、同义(SYN)和非同义(NSY)]对推算的WGS数据进行划分,以表征对所研究性状具有更高预测能力的基因组区域。将动物分为两组,即用于预测性变异检测的发现集(7324头动物)和用于基因组预测的交叉验证集。针对每个基因组区域和整个WGS数据按性状进行全基因组关联研究,以预先选择变异。将每组具有不同密度(1000、3000、5000或10000)的预先选择的SNP分别添加到50k基因型中,并使用基因组最佳线性无偏预测(GBLUP)评估每组基因型的预测性能。结果表明,通过添加来自WGS数据的预先选择的变异,可以提高定制的韩牛50k SNP芯片的预测性能,特别是每个性状的3000个变异,这足以提高所有性状的预测准确性。当将12000个预先选择的变异(每个性状3000个变异)添加到50k基因型中时,与常规的50k SNP芯片相比,背膘厚度、胴体重量、背最长肌面积和大理石花纹评分的预测准确性分别提高了9.9%、9.2%、6.4%和4.7%。在预测偏差方面,所有性状的所有基因型组的回归系数均接近1,表明预测无偏。与使用全基因组相比,基于功能注释选择变异的策略没有显示出明显优势。尽管如此,来自IGR区域的此类预先选择的SNP在基因组区域中预测准确性的提高最高,并且其值接近使用WGS数据对所有性状获得的值。我们得出结论,使用预先选择的变异时预测准确性的额外提高似乎取决于性状,并且与使用特定基因组区域相比,使用WGS数据仍然更准确。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0070/7859490/3360a55cb326/fgene-11-603822-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验