Chen Xiaojing, Huang Longyu, Fan Jingchao, Yan Shen, Zhou Guomin, Zhang Jianhua
National Agriculture Science Data Center, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing, China.
National Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, China.
Front Plant Sci. 2024 Jan 17;14:1293599. doi: 10.3389/fpls.2023.1293599. eCollection 2023.
KASP marker technology has been used in molecular marker-assisted breeding because of its high efficiency and flexibility, and an intelligent evaluation model of KASP marker primer typing results is essential to improve the efficiency of marker development on a large scale. To this end, this paper proposes a gene population delineation method based on NTC identification module and data distribution judgment module to improve the accuracy of K-Means clustering, and introduces a decision tree to construct the KASP-IEva primer typing evaluation model. The model firstly designs the NTC identification module and data distribution judgment module to extract four types of data, grouping and categorizing to achieve the improvement of the distinguishability of amplification product signals; secondly, the K-Means algorithm is used to aggregate and classify the data, to visualize the five aggregated clusters and to obtain the morphology location eigenvalues; lastly, the evaluation criteria for the typing effect level are constructed, and the logical decision tree is used to make conditional discrimination on the eigenvalues in order to realize the score prediction. The performance of the model was tested by the KASP marker typing test results of 2519 groups of cotton varieties, and the following conclusions were obtained: the model is able to visualize the aggregation and classification effects of the amplification products of NTC, pure genotypes, heterozygous genotypes, and untyped genotypes, enabling rapid and accurate KASP marker typing evaluation. Comparing and analyzing the model evaluation results with the expert evaluation results, the average accuracy rate of the four grades evaluated by the model was 87%, and the overall evaluation results showed an uneven distribution of the grades with significant differential characteristics. When evaluating 2519 KASP fractal maps, the expert evaluation consumes 15 hours, and the model evaluation only uses 8min27.45s, which makes the model intelligent evaluation significantly better than the expert evaluation from the perspective of time. The establishment of the model will further enhance the application of KASP markers in molecular marker-assisted breeding and provide technical support for the large-scale screening and identification of excellent genotypes.
KASP标记技术因其高效性和灵活性已被应用于分子标记辅助育种,而KASP标记引物分型结果的智能评估模型对于大规模提高标记开发效率至关重要。为此,本文提出一种基于NTC识别模块和数据分布判断模块的基因群体划分方法,以提高K-Means聚类的准确性,并引入决策树构建KASP-IEva引物分型评估模型。该模型首先设计NTC识别模块和数据分布判断模块,提取四类数据,进行分组和分类,以提高扩增产物信号的可区分性;其次,利用K-Means算法对数据进行聚合和分类,将五个聚合簇可视化并获得形态位置特征值;最后,构建分型效果水平的评估标准,利用逻辑决策树对特征值进行条件判别,以实现分数预测。通过2519组棉花品种的KASP标记分型测试结果对模型性能进行测试,得到以下结论:该模型能够将NTC、纯合基因型、杂合基因型和未分型基因型的扩增产物的聚合和分类效果可视化,实现KASP标记的快速准确分型评估。将模型评估结果与专家评估结果进行比较分析,模型评估的四个等级平均准确率为87%,整体评估结果呈现出等级分布不均且差异显著的特征。在评估2519个KASP分形图时,专家评估耗时15小时,而模型评估仅用时8分27.45秒,从时间角度来看,模型智能评估明显优于专家评估。该模型的建立将进一步提升KASP标记在分子标记辅助育种中的应用,并为大规模筛选和鉴定优良基因型提供技术支持。