Phocas Florence
Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France.
Methods Mol Biol. 2022;2467:113-138. doi: 10.1007/978-1-0716-2205-6_4.
Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.
在现代基因研究中,归因已成为一种标准做法,用于提高基因组覆盖率,并提升基因组选择和全基因组关联研究的准确性。因为可以以较低的密度(和成本)对大量样本进行基因分型,然后利用来自有限参考群体的信息,将其归因到密度更高的标记面板或序列水平。大多数基因型归因算法利用亲属信息和群体连锁不平衡。最初为人类遗传学开发了许多归因软件,最近又针对动植物遗传学开发了相关软件,这些软件考虑了系谱信息以及非常稀疏的单核苷酸多态性(SNP)阵列或测序分型数据。与人类群体相比,养殖物种的群体结构及其有限的有效规模使得从非常低密度的SNP面板和有限的一组参考个体中准确归因高密度基因型或序列成为可能。无论采用何种归因方法,以正确归因率或真实基因型与归因基因型之间的相关性衡量的归因准确性,都会随着待归因个体与其高密度基因分型祖先的亲缘关系增加以及其自身基因型密度的增加而提高。无论采用何种基因组评估方法,提高归因准确性都会提升基因组选择的准确性。考虑到标记密度,影响归因准确性的最重要因素显然是参考群体的规模以及参考群体与目标群体中个体之间的关系。