Suppr超能文献

使用线性回归方法评估非线性模型预测的总体准确性。

Use of the linear regression method to evaluate population accuracy of predictions from non-linear models.

作者信息

Yu Haipeng, Fernando Rohan L, Dekkers Jack C M

机构信息

Department of Animal Sciences, University of Florida, Gainesville, FL, United States.

Department of Animal Science, Iowa State University, Ames, IA, United States.

出版信息

Front Genet. 2024 May 31;15:1380643. doi: 10.3389/fgene.2024.1380643. eCollection 2024.

Abstract

BACKGROUND

To address the limitations of commonly used cross-validation methods, the linear regression method (LR) was proposed to estimate population accuracy of predictions based on the implicit assumption that the fitted model is correct. This method also provides two statistics to determine the adequacy of the fitted model. The validity and behavior of the LR method have been provided and studied for linear predictions but not for nonlinear predictions. The objectives of this study were to 1) provide a mathematical proof for the validity of the LR method when predictions are based on conditional means, regardless of whether the predictions are linear or non-linear 2) investigate the ability of the LR method to detect whether the fitted model is adequate or inadequate, and 3) provide guidelines on how to appropriately partition the data into training and validation such that the LR method can identify an inadequate model.

RESULTS

We present a mathematical proof for the validity of the LR method to estimate population accuracy and to determine whether the fitted model is adequate or inadequate when the predictor is the conditional mean, which may be a non-linear function of the phenotype. Using three partitioning scenarios of simulated data, we show that the one of the LR statistics can detect an inadequate model only when the data are partitioned such that the values of relevant predictor variables differ between the training and validation sets. In contrast, we observed that the other LR statistic was able to detect an inadequate model for all three scenarios.

CONCLUSION

The LR method has been proposed to address some limitations of the traditional approach of cross-validation in genetic evaluation. In this paper, we showed that the LR method is valid when the model is adequate and the conditional mean is the predictor, even when it is a non-linear function of the phenotype. We found one of the two LR statistics is superior because it was able to detect an inadequate model for all three partitioning scenarios (i.e., between animals, by age within animals, and between animals and by age) that were studied.

摘要

背景

为解决常用交叉验证方法的局限性,提出了线性回归方法(LR),基于拟合模型正确这一隐含假设来估计预测的总体准确性。该方法还提供了两个统计量来确定拟合模型的充分性。LR方法的有效性和行为已针对线性预测进行了阐述和研究,但对于非线性预测尚未涉及。本研究的目的是:1)为基于条件均值进行预测时LR方法的有效性提供数学证明,无论预测是线性还是非线性;2)研究LR方法检测拟合模型是否充分的能力;3)提供关于如何将数据适当地划分为训练集和验证集的指导方针,以便LR方法能够识别不充分的模型。

结果

我们给出了一个数学证明,表明当预测变量为条件均值(可能是表型的非线性函数)时,LR方法在估计总体准确性以及确定拟合模型是否充分方面是有效的。使用模拟数据的三种划分方案,我们表明只有当数据划分使得训练集和验证集之间相关预测变量的值不同时,LR统计量之一才能检测到不充分的模型。相比之下,我们观察到另一个LR统计量在所有三种方案中都能够检测到不充分的模型。

结论

提出LR方法是为了解决遗传评估中传统交叉验证方法的一些局限性。在本文中,我们表明当模型充分且条件均值为预测变量时,LR方法是有效的,即使它是表型的非线性函数。我们发现两个LR统计量中的一个更优,因为它能够在所有三种研究的划分方案(即动物之间、动物内按年龄、动物之间和按年龄)中检测到不充分的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9c2/11185077/c9220e180cba/fgene-15-1380643-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验