Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China.
State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Changsha, 410125, China.
BMC Bioinformatics. 2022 Jan 10;23(1):30. doi: 10.1186/s12859-022-04562-9.
Plant variety identification is the one most important of agricultural systems. Development of DNA marker profiles of released varieties to compare with candidate variety or future variety is required. However, strictly speaking, scientists did not use most existing variety identification techniques for "identification" but for "distinction of a limited number of cultivars," of which generalization ability always not be well estimated. Because many varieties have similar genetic backgrounds, even some essentially derived varieties (EDVs) are involved, which brings difficulties for identification and breeding progress. A fast, accurate variety identification method, which also has good performance on EDV determination, needs to be developed.
In this study, with the strategy of "Divide and Conquer," a variety identification method Conditional Random Selection (CRS) method based on SNP of the whole genome of 3024 rice varieties was developed and be applied in essentially derived variety (EDV) identification of rice. CRS is a fast, efficient, and automated variety identification method. Meanwhile, in practical, with the optimal threshold of identity score searched in this study, the set of SNP (including 390 SNPs) showed optimal performance on EDV and non-EDV identification in two independent testing datasets.
This approach first selected a minimal set of SNPs to discriminate non-EDVs in the 3000 Rice Genome Project, then united several simplified SNP sets to improve its generalization ability for EDV and non-EDV identification in testing datasets. The results suggested that the CRS method outperformed traditional feature selection methods. Furthermore, it provides a new way to screen out core SNP loci from the whole genome for DNA fingerprinting of crop varieties and be useful for crop breeding.
植物品种鉴定是农业系统中最重要的一项。需要开发已发布品种的 DNA 标记谱,以与候选品种或未来品种进行比较。然而,严格来说,科学家并没有使用大多数现有的品种鉴定技术进行“鉴定”,而是用于“区分有限数量的品种”,其中推广能力往往无法很好地估计。因为许多品种具有相似的遗传背景,甚至涉及一些本质上衍生的品种(EDVs),这给鉴定和育种进展带来了困难。需要开发一种快速、准确的品种鉴定方法,该方法对 EDV 鉴定也具有良好的性能。
在这项研究中,采用“分而治之”的策略,开发了一种基于全基因组 SNP 的品种鉴定方法 Conditional Random Selection(CRS),并应用于水稻本质上衍生的品种(EDV)鉴定。CRS 是一种快速、高效、自动化的品种鉴定方法。同时,在实际应用中,通过搜索本研究中的最佳身份得分阈值,这套 SNP(包括 390 个 SNP)在两个独立的测试数据集上对 EDV 和非 EDV 鉴定表现出最佳性能。
本研究首先从 3000 个水稻基因组计划中选择了一组最小的 SNP 来区分非 EDV,然后联合了几个简化的 SNP 集来提高其在测试数据集上鉴定 EDV 和非 EDV 的推广能力。结果表明,CRS 方法优于传统的特征选择方法。此外,它为从全基因组中筛选出核心 SNP 位点用于作物品种的 DNA 指纹图谱提供了一种新方法,对作物育种具有重要意义。