Suppr超能文献

改进的高维线性回归两阶段模型平均法及其在核黄素数据分析中的应用。

Improved two-stage model averaging for high-dimensional linear regression, with application to Riboflavin data analysis.

机构信息

Department of Mathematics, Rowan University, Glassboro, NJ, 08028, USA.

出版信息

BMC Bioinformatics. 2021 Mar 25;22(1):155. doi: 10.1186/s12859-021-04053-3.

Abstract

BACKGROUND

Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. In this paper, we develop a two-stage model averaging procedure to enhance accuracy and stability in prediction for high-dimensional linear regression. First we employ a high-dimensional variable selection method such as LASSO to screen redundant predictors and construct a class of candidate models, then we apply the jackknife cross-validation to optimize model weights for averaging.

RESULTS

In simulation studies, the proposed technique outperforms commonly used alternative methods under high-dimensional regression setting, in terms of minimizing the mean of the squared prediction error. We apply the proposed method to a riboflavin data, the result show that such method is quite efficient in forecasting the riboflavin production rate, when there are thousands of genes and only tens of subjects.

CONCLUSIONS

Compared with a recent high-dimensional model averaging procedure (Ando and Li in J Am Stat Assoc 109:254-65, 2014), the proposed approach enjoys three appealing features thus has better predictive performance: (1) More suitable methods are applied for model constructing and weighting. (2) Computational flexibility is retained since each candidate model and its corresponding weight are determined in the low-dimensional setting and the quadratic programming is utilized in the cross-validation. (3) Model selection and averaging are combined in the procedure thus it makes full use of the strengths of both techniques. As a consequence, the proposed method can achieve stable and accurate predictions in high-dimensional linear models, and can greatly help practical researchers analyze genetic data in medical research.

摘要

背景

近年来,模型平均法在分析高维数据方面受到了越来越多的关注。通过适当加权几个竞争的统计模型,模型平均法试图实现稳定和改进的预测。在本文中,我们开发了一种两阶段的模型平均程序,以提高高维线性回归预测的准确性和稳定性。首先,我们采用高维变量选择方法(如 LASSO)来筛选冗余预测因子,并构建一类候选模型,然后应用刀切交叉验证来优化模型权重进行平均。

结果

在模拟研究中,在所提出的技术在高维回归设置下,在最小化均方预测误差方面优于常用的替代方法。我们将所提出的方法应用于核黄素数据,结果表明,当有数千个基因和只有几十个样本时,该方法在预测核黄素生产率方面非常有效。

结论

与最近的一种高维模型平均程序(Ando 和 Li 在 J Am Stat Assoc 109:254-65, 2014)相比,所提出的方法具有三个吸引人的特点,因此具有更好的预测性能:(1) 应用更合适的方法进行模型构建和加权。(2) 保留计算灵活性,因为每个候选模型及其对应的权重都是在低维设置中确定的,并且在交叉验证中利用二次规划。(3) 模型选择和平均在程序中结合,因此充分利用了两种技术的优势。因此,所提出的方法可以在高维线性模型中实现稳定和准确的预测,并可以极大地帮助实际研究人员分析医学研究中的遗传数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe77/7992957/c9b41e7b5bbf/12859_2021_4053_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验