Fu Yong-Bi
Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada.
Plants (Basel). 2023 Mar 28;12(7):1476. doi: 10.3390/plants12071476.
Assessing genetic distinctness and redundancy is an important part of plant germplasm characterization. Over the last decade, such assessment has become more feasible and informative, thanks to the advances in genomic analysis. An attempt was made here to search for genebank germplasm with published genomic data and to assess their genetic distinctness and redundancy based on average pairwise dissimilarity (APD). The effort acquired 12 published genomic data sets from CIMMYT, IPK, USDA-ARS, IRRI, and ICRISAT genebanks. The characterized collections consisted of 661 to 55,879 accessions with up to 2.4 million genome-wide SNPs. The assessment generated an APD estimate for each sample. As a higher or lower APD is indicative of more genetic distinctness or redundance for an accession, respectively, these APD estimates helped to identify the most genetically distinct and redundant groups of 100 accessions each and a genetic outlier group with APD estimates larger than five standard deviations in each data set. An APD-based grouping of the conserved germplasm in each data set revealed among-group variances ranging from 1.5 to 53.4% across all data sets. Additional analyses showed that these APD estimations were more sensitive to SNP number, minor allele frequency, and missing data. Generally, 5000 to 10,000 genome-wide SNPs were required for an effective APD analysis. These findings together are encouraging and useful for germplasm management, utilization, and conservation, particularly in the genetic categorization of conserved germplasm.
评估遗传独特性和冗余性是植物种质特性鉴定的重要组成部分。在过去十年中,由于基因组分析的进展,这种评估变得更加可行且信息丰富。本文试图搜索已发表基因组数据的基因库种质,并基于平均成对差异(APD)评估其遗传独特性和冗余性。这项工作从国际玉米小麦改良中心(CIMMYT)、莱布尼茨植物遗传与作物植物研究所(IPK)、美国农业部农业研究局(USDA-ARS)、国际水稻研究所(IRRI)和国际半干旱热带作物研究所(ICRISAT)基因库获取了12个已发表的基因组数据集。所鉴定的种质收集包含661至55,879份种质,具有多达240万个全基因组单核苷酸多态性(SNP)。该评估为每个样本生成了一个APD估计值。由于较高或较低的APD分别表明一份种质具有更多的遗传独特性或冗余性,这些APD估计值有助于在每个数据集中识别出每组100份遗传上最独特和冗余的种质组以及一个APD估计值大于五个标准差的遗传异常组。基于APD对每个数据集中的保存种质进行分组,结果显示所有数据集中组间方差范围为1.5%至53.4%。进一步分析表明,这些APD估计值对SNP数量、次要等位基因频率和缺失数据更为敏感。一般来说,有效的APD分析需要5000至10000个全基因组SNP。这些研究结果共同为种质管理、利用和保护提供了鼓舞和帮助,特别是在保存种质的遗传分类方面。