• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机广义线性模型:一种高度准确且可解释的集成预测器。

Random generalized linear model: a highly accurate and interpretable ensemble predictor.

机构信息

Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, USA.

出版信息

BMC Bioinformatics. 2013 Jan 16;14:5. doi: 10.1186/1471-2105-14-5.

DOI:10.1186/1471-2105-14-5
PMID:23323760
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3645958/
Abstract

BACKGROUND

Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. However, forward feature selection tends to overfit the data and leads to low predictive accuracy. Therefore, it remains an important research goal to combine the advantages of ensemble predictors (high accuracy) with the advantages of forward regression modeling (interpretability). To address this goal several articles have explored GLM based ensemble predictors. Since limited evaluations suggested that these ensemble predictors were less accurate than alternative predictors, they have found little attention in the literature.

RESULTS

Comprehensive evaluations involving hundreds of genomic data sets, the UCI machine learning benchmark data, and simulations are used to give GLM based ensemble predictors a new and careful look. A novel bootstrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability (random subspace method, optional interaction terms, forward variable selection) often outperforms a host of alternative prediction methods including random forests and penalized regression models (ridge regression, elastic net, lasso). This random generalized linear model (RGLM) predictor provides variable importance measures that can be used to define a "thinned" ensemble predictor (involving few features) that retains excellent predictive accuracy.

CONCLUSION

RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability). These methods are implemented in the freely available R software package randomGLM.

摘要

背景

集成预测器(如随机森林)以其准确性高而著称,但它们的黑盒预测结果难以解释。相比之下,广义线性模型(GLM)非常具有解释性,尤其是在使用前向特征选择构建模型时。然而,前向特征选择往往会过度拟合数据,导致预测精度较低。因此,如何结合集成预测器(高准确性)和前向回归建模(可解释性)的优势仍然是一个重要的研究目标。为了实现这一目标,已经有一些文章探索了基于 GLM 的集成预测器。由于有限的评估表明这些集成预测器的准确性不如其他预测器,因此它们在文献中并没有受到太多关注。

结果

通过综合评估涉及数百个基因组数据集、UCI 机器学习基准数据集和模拟数据的方法,重新审视了基于 GLM 的集成预测器。一种新的基于 bootstrap 集成(bagging)的 GLM 预测器,该预测器结合了一些随机性和不稳定性的元素(随机子空间方法、可选交互项、前向变量选择),通常优于许多替代预测方法,包括随机森林和惩罚回归模型(岭回归、弹性网络、lasso)。这种随机广义线性模型(RGLM)预测器提供了可用于定义“稀疏”集成预测器(涉及少数特征)的变量重要性度量,该预测器保留了出色的预测准确性。

结论

RGLM 是一种先进的预测器,它结合了随机森林(出色的预测准确性、特征重要性度量、袋外估计的准确性)的优势,以及前向选择的广义线性模型(可解释性)的优势。这些方法在免费提供的 R 软件包 randomGLM 中实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/b3796186ea46/1471-2105-14-5-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/4c27d9a2b423/1471-2105-14-5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/012ebf919c81/1471-2105-14-5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/7d7187768627/1471-2105-14-5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/f9da161e749d/1471-2105-14-5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/1df78c4d1a59/1471-2105-14-5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/c99672b60277/1471-2105-14-5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/19242735c452/1471-2105-14-5-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/a8493dc967ea/1471-2105-14-5-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/3ee12362d5a2/1471-2105-14-5-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/1cb438a839b3/1471-2105-14-5-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/b3796186ea46/1471-2105-14-5-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/4c27d9a2b423/1471-2105-14-5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/012ebf919c81/1471-2105-14-5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/7d7187768627/1471-2105-14-5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/f9da161e749d/1471-2105-14-5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/1df78c4d1a59/1471-2105-14-5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/c99672b60277/1471-2105-14-5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/19242735c452/1471-2105-14-5-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/a8493dc967ea/1471-2105-14-5-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/3ee12362d5a2/1471-2105-14-5-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/1cb438a839b3/1471-2105-14-5-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9af5/3645958/b3796186ea46/1471-2105-14-5-11.jpg

相似文献

1
Random generalized linear model: a highly accurate and interpretable ensemble predictor.随机广义线性模型:一种高度准确且可解释的集成预测器。
BMC Bioinformatics. 2013 Jan 16;14:5. doi: 10.1186/1471-2105-14-5.
2
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models.eNetXplorer:用于广义线性模型中弹性网络家族的定量探索的 R 包。
BMC Bioinformatics. 2019 Apr 16;20(1):189. doi: 10.1186/s12859-019-2778-5.
3
Predicting hospitalization following psychiatric crisis care using machine learning.运用机器学习预测精神科危机护理后的住院情况。
BMC Med Inform Decis Mak. 2020 Dec 10;20(1):332. doi: 10.1186/s12911-020-01361-1.
4
Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习
PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.
5
Predictive modeling of blood pressure during hemodialysis: a comparison of linear model, random forest, support vector regression, XGBoost, LASSO regression and ensemble method.血液透析期间血压的预测建模:线性模型、随机森林、支持向量回归、XGBoost、LASSO回归及集成方法的比较
Comput Methods Programs Biomed. 2020 Oct;195:105536. doi: 10.1016/j.cmpb.2020.105536. Epub 2020 May 22.
6
A two-stage modeling approach for breast cancer survivability prediction.两阶段建模方法用于乳腺癌生存预测。
Int J Med Inform. 2021 May;149:104438. doi: 10.1016/j.ijmedinf.2021.104438. Epub 2021 Mar 11.
7
Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications.定量构效关系(QSPR)应用中集成学习模型预测性和可解释性的比较与改进
J Cheminform. 2020 Mar 30;12(1):19. doi: 10.1186/s13321-020-0417-9.
8
Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events.使用惩罚回归识别高血压患者的临床相关特征:心血管事件的案例研究。
Med Biol Eng Comput. 2019 Sep;57(9):2011-2026. doi: 10.1007/s11517-019-02007-9. Epub 2019 Jul 25.
9
Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer.带有强制协变量的稀疏回归难题及其在乳腺癌组织学分级基因评估中的应用
BMC Med Res Methodol. 2017 Jan 25;17(1):12. doi: 10.1186/s12874-017-0291-y.
10
Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.随机森林和线性模型在基准数据集上的预测性能与可解释性比较
J Chem Inf Model. 2017 Aug 28;57(8):1773-1792. doi: 10.1021/acs.jcim.6b00753. Epub 2017 Aug 2.

引用本文的文献

1
Investigating the metabolic reprogramming mechanisms in diabetic nephropathy: a comprehensive analysis using bioinformatics and machine learning.探究糖尿病肾病中的代谢重编程机制:使用生物信息学和机器学习的综合分析
Front Cell Dev Biol. 2025 Aug 29;13:1630708. doi: 10.3389/fcell.2025.1630708. eCollection 2025.
2
Artificial Intelligence in cancer epigenomics: a review on advances in pan-cancer detection and precision medicine.癌症表观基因组学中的人工智能:泛癌检测与精准医学进展综述
Epigenetics Chromatin. 2025 Jun 14;18(1):35. doi: 10.1186/s13072-025-00595-5.
3
Application of machine learning for the analysis of peripheral blood biomarkers in oral mucosal diseases: a cross-sectional study.

本文引用的文献

1
Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.通过坐标下降法求解Cox比例风险模型的正则化路径
J Stat Softw. 2011 Mar;39(5):1-13. doi: 10.18637/jss.v039.i05.
2
Random KNN feature selection - a fast and stable alternative to Random Forests.随机近邻特征选择 - 一种比随机森林更快更稳定的替代方法。
BMC Bioinformatics. 2011 Nov 18;12:450. doi: 10.1186/1471-2105-12-450.
3
Systematic review of genome-wide expression studies in multiple sclerosis.多 sclerosis 全基因组表达研究的系统评价。
机器学习在口腔黏膜疾病外周血生物标志物分析中的应用:一项横断面研究。
BMC Oral Health. 2025 May 10;25(1):703. doi: 10.1186/s12903-025-06095-y.
4
Identification of novel diagnostic and prognostic microRNAs in sarcoma on TCGA dataset: bioinformatics and machine learning approach.利用TCGA数据集鉴定肉瘤中新的诊断和预后微小RNA:生物信息学和机器学习方法
Sci Rep. 2025 Mar 4;15(1):7521. doi: 10.1038/s41598-025-91007-x.
5
Integrated Network Pharmacology, Machine Learning and Experimental Validation to Identify the Key Targets and Compounds of for the Treatment of Breast Cancer.整合网络药理学、机器学习与实验验证以确定治疗乳腺癌的关键靶点和化合物
Onco Targets Ther. 2025 Jan 16;18:49-71. doi: 10.2147/OTT.S486300. eCollection 2025.
6
Predictive value analysis of the interaction network of Tks4 scaffold protein in colon cancer.结肠癌中Tks4支架蛋白相互作用网络的预测价值分析
Front Mol Biosci. 2024 Aug 21;11:1414805. doi: 10.3389/fmolb.2024.1414805. eCollection 2024.
7
Cuticle development and the underlying transcriptome-metabolome associations during early seedling establishment.表皮发育与早期幼苗建立过程中潜在的转录组-代谢组关联。
J Exp Bot. 2024 Oct 30;75(20):6500-6522. doi: 10.1093/jxb/erae311.
8
Investigating the impact of Wnt pathway-related genes on biomarker and diagnostic model development for osteoporosis in postmenopausal females.研究Wnt信号通路相关基因对绝经后女性骨质疏松症生物标志物及诊断模型开发的影响。
Sci Rep. 2024 Feb 4;14(1):2880. doi: 10.1038/s41598-024-52429-1.
9
Machine learning predicts portal vein thrombosis after splenectomy in patients with portal hypertension: Comparative analysis of three practical models.机器学习预测门静脉高压症患者脾切除术后门静脉血栓形成:三种实用模型的比较分析。
World J Gastroenterol. 2022 Aug 28;28(32):4681-4697. doi: 10.3748/wjg.v28.i32.4681.
10
Transcriptional Behavior of Regulatory T Cells Predicts IBD Patient Responses to Vedolizumab Therapy.调节性 T 细胞的转录行为可预测 IBD 患者对维得利珠单抗治疗的反应。
Inflamm Bowel Dis. 2022 Dec 1;28(12):1800-1812. doi: 10.1093/ibd/izac151.
BMJ Open. 2011 Jul 18;1(1):e000053. doi: 10.1136/bmjopen-2011-000053.
4
Building multi-marker algorithms for disease prediction-the role of correlations among markers.构建用于疾病预测的多标记算法——标记间相关性的作用。
Biomark Insights. 2011;6:83-93. doi: 10.4137/BMI.S7513. Epub 2011 Aug 14.
5
Genome-wide expression profiling of five mouse models identifies similarities and differences with human psoriasis.五种小鼠模型的全基因组表达谱分析鉴定了与人类银屑病的相似性和差异。
PLoS One. 2011 Apr 4;6(4):e18266. doi: 10.1371/journal.pone.0018266.
6
Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer.基因表达谱分析揭示非小细胞肺癌的新型生物标志物。
Int J Cancer. 2011 Jul 15;129(2):355-64. doi: 10.1002/ijc.25704. Epub 2010 Nov 28.
7
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.
8
Gene expression profiling in multiple sclerosis: a disease of the central nervous system, but with relapses triggered in the periphery?多发性硬化症中的基因表达谱:中枢神经系统疾病,但在其外周触发复发?
Neurobiol Dis. 2010 Mar;37(3):613-21. doi: 10.1016/j.nbd.2009.11.014. Epub 2009 Nov 26.
9
Using random forest for reliable classification and cost-sensitive learning for medical diagnosis.使用随机森林进行可靠分类并采用成本敏感学习进行医学诊断。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S22. doi: 10.1186/1471-2105-10-S1-S22.
10
Genome-wide scan reveals association of psoriasis with IL-23 and NF-kappaB pathways.全基因组扫描揭示银屑病与白细胞介素-23及核因子κB信号通路的关联。
Nat Genet. 2009 Feb;41(2):199-204. doi: 10.1038/ng.311. Epub 2009 Jan 25.