贪婪回归算法的粗权重提供了新颖的优势。

A greedy regression algorithm with coarse weights offers novel advantages.

机构信息

Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, USA.

Perspectrix, Pittsboro, NC, USA.

出版信息

Sci Rep. 2022 Mar 31;12(1):5440. doi: 10.1038/s41598-022-09415-2.

DOI:10.1038/s41598-022-09415-2

PMID:35361850

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8971398/

Abstract

Regularized regression analysis is a mature analytic approach to identify weighted sums of variables predicting outcomes. We present a novel Coarse Approximation Linear Function (CALF) to frugally select important predictors and build simple but powerful predictive models. CALF is a linear regression strategy applied to normalized data that uses nonzero weights + 1 or - 1. Qualitative (linearly invariant) metrics to be optimized can be (for binary response) Welch (Student) t-test p-value or area under curve (AUC) of receiver operating characteristic, or (for real response) Pearson correlation. Predictor weighting is critically important when developing risk prediction models. While counterintuitive, it is a fact that qualitative metrics can favor CALF with ± 1 weights over algorithms producing real number weights. Moreover, while regression methods may be expected to change most or all weight values upon even small changes in input data (e.g., discarding a single subject of hundreds) CALF weights generally do not so change. Similarly, some regression methods applied to collinear or nearly collinear variables yield unpredictable magnitude or the direction (in p-space) of the weights as a vector. In contrast, with CALF if some predictors are linearly dependent or nearly so, CALF simply chooses at most one (the most informative, if any) and ignores the others, thus avoiding the inclusion of two or more collinear variables in the model.

摘要

正则化回归分析是一种成熟的分析方法，用于识别预测结果的变量加权和。我们提出了一种新颖的粗近似线性函数 (CALF)，以节俭地选择重要预测因子并构建简单但强大的预测模型。CALF 是一种应用于归一化数据的线性回归策略，使用非零权重 +1 或 -1。可优化的定性（线性不变）度量标准可以是（对于二项响应） Welch（Student）t 检验 p 值或接收者操作特征曲线下的面积（AUC），或（对于实值响应）皮尔逊相关系数。在开发风险预测模型时，预测因子加权至关重要。虽然违反直觉，但事实是定性指标可以偏爱具有 ±1 权重的 CALF，而不是产生实数权重的算法。此外，虽然回归方法可能会在输入数据发生微小变化（例如，丢弃数百个中的一个）时更改大多数或所有权重值，但 CALF 权重通常不会发生变化。同样，一些应用于共线性或几乎共线性变量的回归方法会产生不可预测的权重大小或方向（在 p 空间）。相比之下，使用 CALF，如果某些预测因子是线性相关的或几乎如此，CALF 只需选择最多一个（如果有的话，最具信息量的），并忽略其他预测因子，从而避免将两个或更多共线性变量包含在模型中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/888e/8971398/08fd350992ad/41598_2022_9415_Fig1_HTML.jpg

相似文献

A greedy regression algorithm with coarse weights offers novel advantages.贪婪回归算法的粗权重提供了新颖的优势。

Sci Rep. 2022 Mar 31;12(1):5440. doi: 10.1038/s41598-022-09415-2.

Simple Decision-Analytic Functions of the AUC for Ruling Out a Risk Prediction Model and an Added Predictor.用于排除风险预测模型和附加预测因子的 AUC 的简单决策分析函数。

Med Decis Making. 2018 Feb;38(2):225-234. doi: 10.1177/0272989X17732994. Epub 2017 Oct 12.

Stability selection for LASSO with weights based on AUC.基于 AUC 的权重 LASSO 的稳定性选择。

Sci Rep. 2023 Mar 30;13(1):5207. doi: 10.1038/s41598-023-32517-4.

Can a Novel Scoring System Improve on the Mirels Score in Predicting the Fracture Risk in Patients with Multiple Myeloma?新型评分系统能否改善 Mirels 评分在多发性骨髓瘤患者骨折风险预测中的作用？

Clin Orthop Relat Res. 2021 Mar 1;479(3):521-530. doi: 10.1097/CORR.0000000000001303.

Prediction of thoracic injury severity in frontal impacts by selected anatomical morphomic variables through model-averaged logistic regression approach.通过模型平均逻辑回归方法，利用选定的解剖形态变量预测正面碰撞中的胸部损伤严重程度。

Accid Anal Prev. 2013 Nov;60:172-80. doi: 10.1016/j.aap.2013.08.020. Epub 2013 Sep 5.

Revisiting performance metrics for prediction with rare outcomes.重新探讨稀有结局预测的性能指标。

Stat Methods Med Res. 2021 Oct;30(10):2352-2366. doi: 10.1177/09622802211038754. Epub 2021 Sep 1.

Comparison of genomic predictions using genomic relationship matrices built with different weighting factors to account for locus-specific variances.使用基于不同加权因子构建的基因组关系矩阵来考虑位点特异性方差的基因组预测比较。

J Dairy Sci. 2014 Oct;97(10):6547-59. doi: 10.3168/jds.2014-8210. Epub 2014 Aug 14.

Predicting hospitalization following psychiatric crisis care using machine learning.运用机器学习预测精神科危机护理后的住院情况。

BMC Med Inform Decis Mak. 2020 Dec 10;20(1):332. doi: 10.1186/s12911-020-01361-1.

Predictors of sinus rhythm after electrical cardioversion of atrial fibrillation: results from a data mining project on the Flec-SL trial data set.电复律转复心房颤动后窦律维持的预测因素：Flec-SL 试验数据集的数据挖掘项目结果。

Europace. 2017 Jun 1;19(6):921-928. doi: 10.1093/europace/euw144.

A greedy stacking algorithm for model ensembling and domain weighting.一种用于模型集成和域加权的贪婪堆叠算法。

BMC Res Notes. 2020 Feb 12;13(1):70. doi: 10.1186/s13104-020-4931-7.

引用本文的文献

Predicting spread through air space of lung adenocarcinoma based on deep learning and machine learning models.基于深度学习和机器学习模型预测肺腺癌在空气空间中的扩散情况。

J Cardiothorac Surg. 2025 Aug 14;20(1):336. doi: 10.1186/s13019-025-03568-7.

Assessing the prognosis mortality in patients with cutaneous verrucous carcinoma using Lasso-cox regression model: a retrospective study.使用套索-考克斯回归模型评估皮肤疣状癌患者的预后死亡率：一项回顾性研究。

Discov Oncol. 2025 Jun 13;16(1):1091. doi: 10.1007/s12672-025-02893-6.

Body fluid biomarkers and psychosis risk in The Accelerating Medicines Partnership® Schizophrenia Program: design considerations.加速药物合作组织精神分裂症项目中的体液生物标志物与精神病风险：设计考量

Schizophrenia (Heidelb). 2025 May 21;11(1):78. doi: 10.1038/s41537-025-00610-4.

本文引用的文献

Detection of functional and structural brain alterations in female schizophrenia using elastic net logistic regression.使用弹性网络逻辑回归检测女性精神分裂症的功能和结构脑改变。

Brain Imaging Behav. 2022 Feb;16(1):281-290. doi: 10.1007/s11682-021-00501-z. Epub 2021 Jul 27.

α/β-Hydrolase Domain (ABHD) Inhibitors as New Potential Therapeutic Options against Lipid-Related Diseases.α/β-水解酶结构域（ABHD）抑制剂作为治疗与脂质相关疾病的新的潜在治疗选择。

J Med Chem. 2021 Jul 22;64(14):9759-9785. doi: 10.1021/acs.jmedchem.1c00624. Epub 2021 Jul 2.

Involvement of Lipids in Alzheimer's Disease Pathology and Potential Therapies.脂质在阿尔茨海默病病理学中的作用及潜在治疗方法

Front Physiol. 2020 Jun 9;11:598. doi: 10.3389/fphys.2020.00598. eCollection 2020.

Prediction of Schizophrenia Diagnosis by Integration of Genetically Correlated Conditions and Traits.通过遗传相关疾病和特征的整合预测精神分裂症的诊断。

J Neuroimmune Pharmacol. 2018 Dec;13(4):532-540. doi: 10.1007/s11481-018-9811-8. Epub 2018 Oct 1.

A predictive model for conversion to psychosis in clinical high-risk patients.临床高危患者向精神病转化的预测模型。

Psychol Med. 2019 May;49(7):1128-1137. doi: 10.1017/S003329171800171X. Epub 2018 Jun 28.

Model selection and prediction of outcomes in recent onset schizophrenia patients who undergo cognitive training.近期发病的精神分裂症患者接受认知训练后的模型选择与结局预测

Schizophr Res Cogn. 2017 Nov 8;11:1-5. doi: 10.1016/j.scog.2017.10.001. eCollection 2018 Mar.

Apolipoprotein E, Receptors, and Modulation of Alzheimer's Disease.载脂蛋白 E、受体与阿尔茨海默病的调节。

Biol Psychiatry. 2018 Feb 15;83(4):347-357. doi: 10.1016/j.biopsych.2017.03.003. Epub 2017 Mar 14.

Evaluation of machine learning algorithms and structural features for optimal MRI-based diagnostic prediction in psychosis.评估机器学习算法和结构特征以实现基于磁共振成像的精神病最佳诊断预测

PLoS One. 2017 Apr 20;12(4):e0175683. doi: 10.1371/journal.pone.0175683. eCollection 2017.

The Role of microRNA Expression in Cortical Development During Conversion to Psychosis.miRNA 表达在向精神病转化过程中的皮质发育中的作用。

Neuropsychopharmacology. 2017 Oct;42(11):2188-2195. doi: 10.1038/npp.2017.34. Epub 2017 Feb 10.

Insights into psychosis risk from leukocyte microRNA expression.从白细胞微小RNA表达看精神病风险

Transl Psychiatry. 2016 Dec 13;6(12):e981. doi: 10.1038/tp.2016.148.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

贪婪回归算法的粗权重提供了新颖的优势。

A greedy regression algorithm with coarse weights offers novel advantages.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献