插补方法对大豆单核苷酸多态性面板捕获的遗传变异量的影响。

Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans.

作者信息

Xavier A, Muir William M, Rainey Katy M

机构信息

Department of Agronomy, Purdue University, Lilly Hall of Life Sciences, 915 W. State St., West Lafayette, Indiana, 47907, USA.

Department of Animal Science, Purdue University, Lilly Hall of Life Sciences, 915 W. State St., West Lafayette, Indiana, 47907, USA.

出版信息

BMC Bioinformatics. 2016 Feb 2;17:55. doi: 10.1186/s12859-016-0899-7.

DOI:10.1186/s12859-016-0899-7

PMID:26830693

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4736474/

Abstract

BACKGROUND

Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition.

RESULTS

We propose an imputation method based on multivariate mixed models using pedigree information. Our methods comparison indicate that heritability of traits can be affected by the imputation method. Genotypes with missing values imputed with methods that make use of genealogic information can favor genetic analysis of highly polygenic traits, but not genome-wide prediction accuracy. The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper.

CONCLUSIONS

We concluded that hidden Markov models and random forest imputation are more suitable to studies that aim analyses of highly heritable traits while pedigree-based methods can be used to best analyze traits with low heritability. Despite the notable contribution to heritability, advantages in genomic prediction were not observed by changing the imputation method. We identified significant differences across imputation methods in a dataset missing 20 % of the genotypic values. It means that genotypic data from genotyping technologies that provide a high proportion of missing values, such as GBS, should be handled carefully because the imputation method will impact downstream analysis.

摘要

背景

全基因组关联研究和标记辅助选择的成功取决于良好的表型和基因型数据。这些数据越完整，分析结果就越有力。然而，尽管存在大量缺失数据，仍有一些下一代技术试图提供基因型信息。因此，这些技术用于推算遗传数据的程序会极大地影响下游分析。本研究旨在：（1）比较使用各种方法推算缺失数据的大豆单核苷酸多态性面板中的遗传方差；（2）评估与这些方法相关的推算准确性和推算后质量；（3）评估推算方法对大豆性状遗传力和全基因组预测准确性的影响。我们评估的推算方法如下：多变量混合模型、隐马尔可夫模型、逻辑算法、k近邻法、奇异值分解和随机森林。我们使用了来自大豆关联作图群体（SoyNAM）项目的原始基因型以及以下表型：株高、成熟天数、籽粒产量和种子蛋白组成。

结果

我们提出了一种基于使用系谱信息的多变量混合模型的推算方法。我们的方法比较表明，性状的遗传力会受到推算方法的影响。使用系谱信息的方法推算出的缺失值基因型有利于高度多基因性状的遗传分析，但不利于全基因组预测准确性。当用本文提出的方法推算缺失位点时，基因型矩阵捕获了最高量的遗传方差。

结论

我们得出结论，隐马尔可夫模型和随机森林推算更适合旨在分析高遗传力性状的研究，而基于系谱的方法可用于最佳分析低遗传力性状。尽管对遗传力有显著贡献，但改变推算方法未观察到在基因组预测方面的优势。我们在一个缺失20%基因型值的数据集中发现了不同推算方法之间的显著差异。这意味着来自提供高比例缺失值的基因分型技术（如GBS）的基因型数据应谨慎处理，因为推算方法会影响下游分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dadf/4736474/cca9c0c7b9bd/12859_2016_899_Fig1_HTML.jpg

相似文献

Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans.插补方法对大豆单核苷酸多态性面板捕获的遗传变异量的影响。

BMC Bioinformatics. 2016 Feb 2;17:55. doi: 10.1186/s12859-016-0899-7.

Genotyping by sequencing for genomic prediction in a soybean breeding population.大豆育种群体中用于基因组预测的测序基因分型

BMC Genomics. 2014 Aug 29;15(1):740. doi: 10.1186/1471-2164-15-740.

Genomic Prediction for Grain Yield and Yield-Related Traits in Chinese Winter Wheat.中国冬小麦产量及产量相关性状的基因组预测。

Int J Mol Sci. 2020 Feb 17;21(4):1342. doi: 10.3390/ijms21041342.

Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.扫描与填充：结合简化基因组测序、SNP芯片和全基因组重测序数据的超密集SNP基因分型

PLoS One. 2015 Jul 10;10(7):e0131533. doi: 10.1371/journal.pone.0131533. eCollection 2015.

Imputation of missing single nucleotide polymorphism genotypes using a multivariate mixed model framework.使用多元混合模型框架对缺失的单核苷酸多态性基因型进行推断。

J Anim Sci. 2011 Jul;89(7):2042-9. doi: 10.2527/jas.2010-3297. Epub 2011 Feb 25.

Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.利用推算的全基因组序列数据对荷斯坦奶牛进行基因组预测。

Genet Sel Evol. 2015 Sep 17;47(1):71. doi: 10.1186/s12711-015-0149-x.

Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation.牛群中的低深度测序基因分型（GBS）：最大化高质量基因型选择和归因准确性的策略。

BMC Genet. 2017 Apr 5;18(1):32. doi: 10.1186/s12863-017-0501-y.

Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation.利用简化基因组测序（GBS）和填充技术对圈养非人灵长类动物进行全基因组特征分析。

BMC Genomics. 2016 Aug 24;17(1):676. doi: 10.1186/s12864-016-2966-x.

Assessing Predictive Properties of Genome-Wide Selection in Soybeans.评估大豆全基因组选择的预测特性。

G3 (Bethesda). 2016 Aug 9;6(8):2611-6. doi: 10.1534/g3.116.032268.

Ascertainment bias from imputation methods evaluation in wheat.小麦中归因方法评估的确定性偏差。

BMC Genomics. 2016 Oct 4;17(1):773. doi: 10.1186/s12864-016-3120-5.

引用本文的文献

Enhanced Disease Resistance Mechanism of the CmoAP2/ERF Transcription Factor in Pumpkin through Genetic Mutations.通过基因突变增强南瓜中CmoAP2/ERF转录因子的抗病机制

ACS Omega. 2024 Nov 14;9(47):46974-46985. doi: 10.1021/acsomega.4c06748. eCollection 2024 Nov 26.

Two decades of association mapping: Insights on disease resistance in major crops.二十年的关联作图：对主要作物抗病性的见解

Front Plant Sci. 2022 Dec 6;13:1064059. doi: 10.3389/fpls.2022.1064059. eCollection 2022.

Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components.基于机器学习的全基因组关联研究揭示大豆产量及其组分的 QTL。

Int J Mol Sci. 2022 May 16;23(10):5538. doi: 10.3390/ijms23105538.

Identification of Candidate Variants Associated With Bone Weight Using Whole Genome Sequence in Beef Cattle.利用全基因组序列鉴定肉牛中与骨重相关的候选变异体。

Front Genet. 2021 Nov 29;12:750746. doi: 10.3389/fgene.2021.750746. eCollection 2021.

Imputation of 3 million SNPs in the Arabidopsis regional mapping population.在拟南芥区域作图群体中对 300 万个 SNPs 进行了插补。

Plant J. 2020 May;102(4):872-882. doi: 10.1111/tpj.14659. Epub 2020 Feb 11.

Phenotype Prediction and Genome-Wide Association Study Using Deep Convolutional Neural Network of Soybean.基于深度卷积神经网络的大豆表型预测与全基因组关联研究

Front Genet. 2019 Nov 22;10:1091. doi: 10.3389/fgene.2019.01091. eCollection 2019.

Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.花椰菜基因库种质资源中与花球相关性状的基因组预测与关联分析

G3 (Bethesda). 2018 Feb 2;8(2):707-718. doi: 10.1534/g3.117.300199.

Spatial and Temporal Scales of Range Expansion in Wild Phaseolus vulgaris.野生普通菜豆的扩散范围的时空尺度。

Mol Biol Evol. 2018 Jan 1;35(1):119-131. doi: 10.1093/molbev/msx273.

A HapMap leads to a Capsicum annuum SNP infinium array: a new tool for pepper breeding.一份人类基因组单体型图导致了甜椒 SNP infinium 基因芯片的出现：辣椒育种的新工具。

Hortic Res. 2016 Jul 27;3:16036. doi: 10.1038/hortres.2016.36. eCollection 2016.

Walking through the statistical black boxes of plant breeding.穿越植物育种的统计黑箱。

Theor Appl Genet. 2016 Oct;129(10):1933-49. doi: 10.1007/s00122-016-2750-y. Epub 2016 Jul 19.

本文引用的文献

Prospects and limits of marker imputation in quantitative genetic studies in European elite wheat (Triticum aestivum L.).欧洲优质小麦（普通小麦）数量遗传学研究中标记填充的前景与局限

BMC Genomics. 2015 Mar 11;16(1):168. doi: 10.1186/s12864-015-1366-y.

Quality control of genotypes using heritability estimates of gene content at the marker.利用标记处基因含量的遗传力估计值对基因型进行质量控制。

Genetics. 2015 Mar;199(3):675-81. doi: 10.1534/genetics.114.173559. Epub 2015 Jan 6.

Kernel-based whole-genome prediction of complex traits: a review.基于核的全基因组复杂性状预测：综述。

Front Genet. 2014 Oct 16;5:363. doi: 10.3389/fgene.2014.00363. eCollection 2014.

Genotyping by sequencing for genomic prediction in a soybean breeding population.大豆育种群体中用于基因组预测的测序基因分型

BMC Genomics. 2014 Aug 29;15(1):740. doi: 10.1186/1471-2164-15-740.

Impact of genotype imputation on the performance of GBLUP and Bayesian methods for genomic prediction.基因型填充对基因组预测中GBLUP和贝叶斯方法性能的影响。

PLoS One. 2014 Jul 15;9(7):e101544. doi: 10.1371/journal.pone.0101544. eCollection 2014.

Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle.荷斯坦奶牛全基因组序列数据插补的准确性

Genet Sel Evol. 2014 Jul 15;46(1):41. doi: 10.1186/1297-9686-46-41.

Genome-wide regression and prediction with the BGLR statistical package.使用BGLR统计软件包进行全基因组回归与预测。

Genetics. 2014 Oct;198(2):483-95. doi: 10.1534/genetics.114.164442. Epub 2014 Jul 9.

A new genotype imputation method with tolerance to high missing rate and rare variants.一种对高缺失率和罕见变异具有耐受性的新基因型插补方法。

PLoS One. 2014 Jun 27;9(6):e101025. doi: 10.1371/journal.pone.0101025. eCollection 2014.

Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures.用于具有加性和上位性遗传结构的性状基因组选择的参数和非参数统计方法。

G3 (Bethesda). 2014 Apr 11;4(6):1027-46. doi: 10.1534/g3.114.010298.

Kernel-based variance component estimation and whole-genome prediction of pre-corrected phenotypes and progeny tests for dairy cow health traits.基于核的方差分量估计和全基因组预测校正前表型和奶牛健康性状后代测定。

Front Genet. 2014 Mar 24;5:56. doi: 10.3389/fgene.2014.00056. eCollection 2014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

插补方法对大豆单核苷酸多态性面板捕获的遗传变异量的影响。

Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献