植物育种中低密度标记芯片数据的填充：基于甜菜的方法评估

Imputation of low-density marker chip data in plant breeding: Evaluation of methods based on sugar beet.

作者信息

Niehoff Tobias, Pook Torsten, Gholami Mahmood, Beissinger Timothy

机构信息

Animal Breeding and Genomics, Wageningen Univ. & Research, Postbox 338, 6700AH, Wageningen, The Netherlands.

Dep. of Crop Sciences, Division of Plant Breeding Methodology, Univ. of Göttingen, Göttingen, 37075, Germany.

出版信息

Plant Genome. 2022 Dec;15(4):e20257. doi: 10.1002/tpg2.20257. Epub 2022 Oct 18.

DOI:10.1002/tpg2.20257

PMID:36258672

Abstract

Low-density genotyping followed by imputation reduces genotyping costs while still providing high-density marker information. An increased marker density has the potential to improve the outcome of all applications that are based on genomic data. This study investigates techniques for 1k to 20k genomic marker imputation for plant breeding programs with sugar beet (Beta vulgaris L. ssp. vulgaris) as an example crop, where these are realistic marker numbers for modern breeding applications. The generally accepted 'gold standard' for imputation, Beagle 5.1, was compared with the recently developed software AlphaPlantImpute2 which is designed specifically for plant breeding. For Beagle 5.1 and AlphaPlantImpute2, the imputation strategy as well as the imputation parameters were optimized in this study. We found that the imputation accuracy of Beagle could be tremendously improved (0.22 to 0.67) by tuning parameters, mainly by lowering the values for the parameter for the effective population size and increasing the number of iterations performed. Separating the phasing and imputation steps also improved accuracies when optimized parameters were used (0.67 to 0.82). We also found that the imputation accuracy of Beagle decreased when more low-density lines were included for imputation. AlphaPlantImpute2 produced very high accuracies without optimization (0.89) and was generally less responsive to optimization. Overall, AlphaPlantImpute2 performed relatively better for imputation whereas Beagle was better for phasing. Combining both tools yielded the highest accuracies.

摘要

低密度基因分型随后进行填充可降低基因分型成本，同时仍能提供高密度标记信息。标记密度的增加有可能改善所有基于基因组数据的应用结果。本研究以甜菜（Beta vulgaris L. ssp. vulgaris）为例，研究了用于植物育种计划的1k至20k基因组标记填充技术，这些标记数量对于现代育种应用而言是现实可行的。将普遍认可的填充“金标准”Beagle 5.1与专门为植物育种开发的最新软件AlphaPlantImpute2进行了比较。在本研究中，对Beagle 5.1和AlphaPlantImpute2的填充策略以及填充参数进行了优化。我们发现，通过调整参数，主要是降低有效种群大小参数的值并增加执行的迭代次数，Beagle的填充准确性可大幅提高（从0.22提高到0.67）。当使用优化参数时，将定相和填充步骤分开也提高了准确性（从0.67提高到0.82）。我们还发现，当纳入更多低密度品系进行填充时，Beagle的填充准确性会降低。AlphaPlantImpute2在未优化的情况下就产生了非常高的准确性（0.89），并且通常对优化的响应较小。总体而言，AlphaPlantImpute2在填充方面表现相对更好，而Beagle在定相方面表现更好。结合使用这两种工具可产生最高的准确性。