Suppr超能文献

基于变量重要性的统计模型选择包括变量选择:一种选择生物标志物以预测肉嫩度的相关方法。

Statistical model choice including variable selection based on variable importance: A relevant way for biomarkers selection to predict meat tenderness.

机构信息

Université Clermont Auvergne, INRA, VetAgro Sup, UMR Herbivores, F-63122, Saint-Genès-Champanelle, France.

INRIA Bordeaux Sud-Ouest, CQFD Team, F-33400, Talence, France.

出版信息

Sci Rep. 2019 Jul 10;9(1):10014. doi: 10.1038/s41598-019-46202-y.

Abstract

In this paper, we describe a new computational methodology to select the best regression model to predict a numerical variable of interest Y and to select simultaneously the most interesting numerical explanatory variables strongly linked to Y. Three regression models (parametric, semi-parametric and non-parametric) are considered and estimated by multiple linear regression, sliced inverse regression and random forests. Both the variables selection and the model choice are computational. A measure of importance based on random perturbations is calculated for each covariate. The variables above a threshold are selected. Then a learning/test samples approach is used to estimate the Mean Square Error and to determine which model (including variable selection) is the most accurate. The R package modvarsel (MODel and VARiable SELection) implements this computational approach and applies to any regression datasets. After checking the good behavior of the methodology on simulated data, the R package is used to select the proteins predictive of meat tenderness among a pool of 21 candidate proteins assayed in semitendinosus muscle from 71 young bulls. The biomarkers were selected by linear regression (the best regression model) to predict meat tenderness. These biomarkers, we confirm the predominant role of heat shock proteins and metabolic ones.

摘要

在本文中,我们描述了一种新的计算方法,用于选择最佳的回归模型来预测感兴趣的数值变量 Y,并同时选择与 Y 强相关的最有趣的数值解释变量。考虑了三种回归模型(参数、半参数和非参数),并通过多元线性回归、切片逆回归和随机森林进行了估计。变量选择和模型选择都是计算性的。为每个协变量计算了基于随机扰动的重要性度量。选择超过阈值的变量。然后使用学习/测试样本方法来估计均方误差,并确定哪种模型(包括变量选择)最准确。R 包 modvarsel(模型和变量选择)实现了这种计算方法,并适用于任何回归数据集。在模拟数据上检查了该方法的良好行为之后,我们使用 R 包从 71 头年轻公牛的半腱肌中测定的 21 个候选蛋白质中选择预测肉质嫩度的蛋白质。使用线性回归(最佳回归模型)选择生物标志物来预测肉质嫩度。这些生物标志物,我们确认了热休克蛋白和代谢蛋白的主要作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/716f/6620333/89619ccfb12e/41598_2019_46202_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验