Pan Shouhui, Liu Zhongqiang, Han Yanyun, Zhang Dongfeng, Zhao Xiangyu, Li Jinlong, Wang Kaiyi
Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China.
National Engineering Research Center for Information Technology in Agriculture, Beijing, China.
Front Plant Sci. 2024 Dec 10;15:1480463. doi: 10.3389/fpls.2024.1480463. eCollection 2024.
How to evaluate the accuracy of quantitative trait prediction is crucial to choose the best model among several possible choices in plant breeding. Pearson's correlation coefficient (PCC), serving as a metric for quantifying the strength of the linear association between two variables, is widely used to evaluate the accuracy of the quantitative trait prediction models, and generally performs well in most circumstances. However, PCC may not always offer a comprehensive view of predictive accuracy, especially in cases involving nonlinear relationships or complex dependencies in machine learning-based methods. It has been found that many papers on quantitative trait prediction solely use PCC as a single metric to evaluate the accuracy of their models, which is insufficient and limited from a formal perspective. This study addresses this crucial issue by presenting a typical example and conducting a comparative analysis of PCC and nine other evaluation metrics using four traditional methods and four machine learning-based methods, thereby contributing to the improvement of practical applicability and reliability of plant quantitative trait prediction models. It is recommended to employ PCC in conjunction with other evaluation metrics in a targeted manner based on specific application scenarios to reduce the likelihood of drawing misleading conclusions.
如何评估数量性状预测的准确性对于在植物育种的几种可能选择中选择最佳模型至关重要。皮尔逊相关系数(PCC)作为量化两个变量之间线性关联强度的指标,被广泛用于评估数量性状预测模型的准确性,并且在大多数情况下通常表现良好。然而,PCC可能并不总是能全面反映预测准确性,特别是在基于机器学习的方法中涉及非线性关系或复杂依赖关系的情况下。已经发现,许多关于数量性状预测的论文仅使用PCC作为单一指标来评估其模型的准确性,从形式角度来看,这是不够的且有局限性。本研究通过给出一个典型例子,并使用四种传统方法和四种基于机器学习的方法对PCC和其他九个评估指标进行比较分析,解决了这个关键问题,从而有助于提高植物数量性状预测模型的实际适用性和可靠性。建议根据具体应用场景有针对性地将PCC与其他评估指标结合使用,以减少得出误导性结论的可能性。