提高正则化广义线性模型和Cox比例风险模型变量选择的稳健性及预测性能。

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models.

作者信息

Hong Feng, Tian Lu, Devanarayan Viswanath

机构信息

Takeda Pharmaceuticals, Cambridge, MA 02139, USA.

Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.

出版信息

Mathematics (Basel). 2023 Feb;11(3). doi: 10.3390/math11030557. Epub 2023 Jan 20.

DOI:10.3390/math11030557

PMID:37990696

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10660556/

Abstract

High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This -based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset.

摘要

高维数据应用通常需要使用各种统计和机器学习算法，以基于生物标志物和其他患者特征来识别最优特征集，从而在生物医学研究中预测期望的临床结果。此类生物标志物特征集的组成和预测性能在各种生物医学研究应用中都至关重要。然而，在存在大量特征的情况下，传统的回归分析方法无法产生良好的预测模型。一种广泛使用的补救方法是在拟合相关回归模型时引入正则化。特别是，对回归系数施加惩罚非常有用，并且已经开发出非常有效的数值算法来拟合具有不同类型响应的此类模型。这种基于惩罚的正则化倾向于生成具有良好预测性能的简约预测模型，即，在构建预测模型的同时实现了特征选择。变量选择以及因此特征集的组成，以及模型的预测性能取决于在惩罚正则化中使用的惩罚参数的选择。惩罚参数通常通过K折交叉验证来选择。然而，这样的算法往往不稳定，并且在同一数据集上多次运行时可能会产生非常不同的惩罚参数选择。此外，该算法中内部交叉验证过程的预测性能估计往往会被夸大。在本文中，我们提出了一种蒙特卡罗方法来提高正则化参数选择的稳健性，以及一个额外的交叉验证包装器，用于客观评估最终模型的预测性能。我们通过模拟展示了改进，并通过一个真实数据集说明了应用情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db9f/10660556/40ac0066636b/nihms-1944212-f0001.jpg

相似文献

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models.提高正则化广义线性模型和Cox比例风险模型变量选择的稳健性及预测性能。

Mathematics (Basel). 2023 Feb;11(3). doi: 10.3390/math11030557. Epub 2023 Jan 20.

eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models.eNetXplorer：用于广义线性模型中弹性网络家族的定量探索的 R 包。

BMC Bioinformatics. 2019 Apr 16;20(1):189. doi: 10.1186/s12859-019-2778-5.

Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification.基于 L1/2 罚项的稀疏逻辑回归在癌症分类中的基因选择。

BMC Bioinformatics. 2013 Jun 19;14:198. doi: 10.1186/1471-2105-14-198.

Efficient ℓ -norm feature selection based on augmented and penalized minimization.基于增广和惩罚最小化的高效 ℓ -范数特征选择。

Stat Med. 2018 Feb 10;37(3):473-486. doi: 10.1002/sim.7526. Epub 2017 Oct 30.

Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events.使用惩罚回归识别高血压患者的临床相关特征：心血管事件的案例研究。

Med Biol Eng Comput. 2019 Sep;57(9):2011-2026. doi: 10.1007/s11517-019-02007-9. Epub 2019 Jul 25.

Regularized parametric survival modeling to improve risk prediction models.正则化参数生存分析模型在风险预测模型中的应用。

Biom J. 2024 Jan;66(1):e2200319. doi: 10.1002/bimj.202200319. Epub 2023 Sep 29.

A Generic Path Algorithm for Regularized Statistical Estimation.一种用于正则化统计估计的通用路径算法。

J Am Stat Assoc. 2014;109(506):686-699. doi: 10.1080/01621459.2013.864166.

Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data.利用牛奶近红外光谱数据评估机器学习方法和变量选择方法在荷斯坦奶牛中预测难以测量性状的性能。

J Dairy Sci. 2021 Jul;104(7):8107-8121. doi: 10.3168/jds.2020-19861. Epub 2021 Apr 15.

Repeated Sieving for Prediction Model Building with High-Dimensional Data.用于高维数据预测模型构建的重复筛选

J Pers Med. 2024 Jul 19;14(7):769. doi: 10.3390/jpm14070769.

Elastic Net Regularization Paths for All Generalized Linear Models.所有广义线性模型的弹性网络正则化路径

J Stat Softw. 2023;106. doi: 10.18637/jss.v106.i01. Epub 2023 Mar 23.

引用本文的文献

A machine learning-based depression risk prediction model for healthy middle-aged and older adult people based on data from the China health and aging tracking study.基于中国健康与养老追踪调查数据的、针对健康中老年人群的机器学习抑郁症风险预测模型。

Front Public Health. 2025 Aug 6;13:1515094. doi: 10.3389/fpubh.2025.1515094. eCollection 2025.

AI diagnostics in bone oncology for predicting bone metastasis in lung cancer patients using DenseNet-264 deep learning model and radiomics.使用DenseNet - 264深度学习模型和放射组学的骨肿瘤人工智能诊断在预测肺癌患者骨转移中的应用

J Bone Oncol. 2024 Sep 26;48:100640. doi: 10.1016/j.jbo.2024.100640. eCollection 2024 Oct.

Differential gene expression analysis pipelines and bioinformatic tools for the identification of specific biomarkers: A review.用于鉴定特定生物标志物的差异基因表达分析流程和生物信息学工具：综述

Comput Struct Biotechnol J. 2024 Mar 1;23:1154-1168. doi: 10.1016/j.csbj.2024.02.018. eCollection 2024 Dec.

Predicting clinical progression trajectories of early Alzheimer's disease patients.预测早期阿尔茨海默病患者的临床进展轨迹。

Alzheimers Dement. 2024 Mar;20(3):1725-1738. doi: 10.1002/alz.13565. Epub 2023 Dec 13.

本文引用的文献

A multivariate predictive modeling approach reveals a novel CSF peptide signature for both Alzheimer's Disease state classification and for predicting future disease progression.一种多变量预测建模方法揭示了一种用于阿尔茨海默病状态分类和预测未来疾病进展的新型脑脊液肽特征。

PLoS One. 2017 Aug 3;12(8):e0182098. doi: 10.1371/journal.pone.0182098. eCollection 2017.

Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.通过坐标下降法求解Cox比例风险模型的正则化路径

J Stat Softw. 2011 Mar;39(5):1-13. doi: 10.18637/jss.v039.i05.

Development and evaluation of a multiplexed mass spectrometry based assay for measuring candidate peptide biomarkers in Alzheimer's Disease Neuroimaging Initiative (ADNI) CSF.基于多重质谱分析的阿尔茨海默病神经影像倡议（ADNI）脑脊液中候选肽生物标志物测量方法的开发与评估。

Proteomics Clin Appl. 2015 Aug;9(7-8):715-31. doi: 10.1002/prca.201400178. Epub 2015 Apr 24.

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

The lasso method for variable selection in the Cox model.Cox模型中用于变量选择的套索方法。

Stat Med. 1997 Feb 28;16(4):385-95. doi: 10.1002/(sici)1097-0258(19970228)16:4<385::aid-sim380>3.0.co;2-3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

提高正则化广义线性模型和Cox比例风险模型变量选择的稳健性及预测性能。

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献