基于堆叠弹性网络的预测和可解释模型。

Predictive and interpretable models via the stacked elastic net.

机构信息

Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 4362 Esch-sur-Alzette, Luxembourg.

Department of Epidemiology and Data Science, Amsterdam UMC, 1081 HV Amsterdam, The Netherlands.

出版信息

Bioinformatics. 2021 Aug 4;37(14):2012-2016. doi: 10.1093/bioinformatics/btaa535.

DOI:10.1093/bioinformatics/btaa535

PMID:32437519

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8336997/

Abstract

MOTIVATION

Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often want to know which features have effects, whether these effects are positive or negative and how strong these effects are. Regression analysis includes this information in the coefficients but typically renders less predictive models than more advanced machine learning techniques.

RESULTS

Here, we propose an interpretable meta-learning approach for high-dimensional regression. The elastic net provides a compromise between estimating weak effects for many features and strong effects for some features. It has a mixing parameter to weight between ridge and lasso regularization. Instead of selecting one weighting by tuning, we combine multiple weightings by stacking. We do this in a way that increases predictivity without sacrificing interpretability.

AVAILABILITY AND IMPLEMENTATION

The R package starnet is available on GitHub (https://github.com/rauschenberger/starnet) and CRAN (https://CRAN.R-project.org/package=starnet).

摘要

动机

机器学习在生物医学科学中的应用理想情况下应提供具有预测能力且可解释的模型。在基于临床或分子特征预测结果时，应用研究人员通常希望了解哪些特征具有影响，这些影响是正向的还是负向的，以及这些影响的强度如何。回归分析将这些信息包含在系数中，但通常会生成不如更先进的机器学习技术那样具有预测能力的模型。

结果

在这里，我们提出了一种用于高维回归的可解释元学习方法。弹性网络在估计许多特征的弱影响和一些特征的强影响之间提供了一种折衷。它有一个混合参数，用于在岭回归和套索正则化之间进行加权。我们不是通过调整来选择一个权重，而是通过堆叠来组合多个权重。我们以一种在不牺牲可解释性的情况下提高预测能力的方式来做到这一点。

可用性和实现

R 包 starnet 可在 GitHub（https://github.com/rauschenberger/starnet）和 CRAN（https://CRAN.R-project.org/package=starnet）上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e8e/8336997/999959e2887c/btaa535f1.jpg

相似文献

Predictive and interpretable models via the stacked elastic net.基于堆叠弹性网络的预测和可解释模型。

Bioinformatics. 2021 Aug 4;37(14):2012-2016. doi: 10.1093/bioinformatics/btaa535.

Predicting correlated outcomes from molecular data.从分子数据预测相关结果。

Bioinformatics. 2021 Nov 5;37(21):3889-3895. doi: 10.1093/bioinformatics/btab576.

eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models.eNetXplorer：用于广义线性模型中弹性网络家族的定量探索的 R 包。

BMC Bioinformatics. 2019 Apr 16;20(1):189. doi: 10.1186/s12859-019-2778-5.

sparsesurv: a Python package for fitting sparse survival models via knowledge distillation.sparsesurv：一个通过知识蒸馏来拟合稀疏生存模型的 Python 包。

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae521.

RMTL: an R library for multi-task learning.RMTL：一个用于多任务学习的 R 库。

Bioinformatics. 2019 May 15;35(10):1797-1798. doi: 10.1093/bioinformatics/bty831.

treeheatr: an R package for interpretable decision tree visualizations.treeheatr：一个用于可解释决策树可视化的 R 包。

Bioinformatics. 2021 Apr 19;37(2):282-284. doi: 10.1093/bioinformatics/btaa662.

Penalized regression with multiple sources of prior effects.带有多个先验效应来源的惩罚回归。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad680.

Predicting dichotomised outcomes from high-dimensional data in biomedicine.预测生物医学中高维数据的二分结果。

J Appl Stat. 2023 Jul 26;51(9):1756-1771. doi: 10.1080/02664763.2023.2233057. eCollection 2024.

R.ROSETTA: an interpretable machine learning framework.R.ROSETTA：一个可解释的机器学习框架。

BMC Bioinformatics. 2021 Mar 6;22(1):110. doi: 10.1186/s12859-021-04049-z.

Fast and interpretable genomic data analysis using multiple approximate kernel learning.使用多种近似核学习进行快速且可解释的基因组数据分析。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i77-i83. doi: 10.1093/bioinformatics/btac241.

引用本文的文献

Fast and scalable ensemble learning method for versatile polygenic risk prediction.快速且可扩展的集成学习方法，用于多功能多基因风险预测。

Proc Natl Acad Sci U S A. 2024 Aug 13;121(33):e2403210121. doi: 10.1073/pnas.2403210121. Epub 2024 Aug 7.

Predicting dichotomised outcomes from high-dimensional data in biomedicine.预测生物医学中高维数据的二分结果。

J Appl Stat. 2023 Jul 26;51(9):1756-1771. doi: 10.1080/02664763.2023.2233057. eCollection 2024.

Fast Marginal Likelihood Estimation of Penalties for Group-Adaptive Elastic Net.分组自适应弹性网络惩罚项的快速边际似然估计

J Comput Graph Stat. 2022 Nov 9;32(3):950-960. doi: 10.1080/10618600.2022.2128809. eCollection 2023.

Penalized regression with multiple sources of prior effects.带有多个先验效应来源的惩罚回归。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad680.

Ten quick tips for biomarker discovery and validation analyses using machine learning.使用机器学习进行生物标志物发现与验证分析的十条快速提示。

PLoS Comput Biol. 2022 Aug 11;18(8):e1010357. doi: 10.1371/journal.pcbi.1010357. eCollection 2022 Aug.

Predicting correlated outcomes from molecular data.从分子数据预测相关结果。

Bioinformatics. 2021 Nov 5;37(21):3889-3895. doi: 10.1093/bioinformatics/btab576.

本文引用的文献

Learning from a lot: Empirical Bayes for high-dimensional model-based prediction.博采众长：基于高维模型预测的经验贝叶斯方法

Scand Stat Theory Appl. 2019 Mar;46(1):2-25. doi: 10.1111/sjos.12335. Epub 2018 Jun 1.

The Cancer Genome Atlas Pan-Cancer analysis project.癌症基因组图谱泛癌分析项目。

Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.

Random generalized linear model: a highly accurate and interpretable ensemble predictor.随机广义线性模型：一种高度准确且可解释的集成预测器。

BMC Bioinformatics. 2013 Jan 16;14:5. doi: 10.1186/1471-2105-14-5.

Optimized application of penalized regression methods to diverse genomic data.优化惩罚回归方法在多种基因组数据中的应用。

Bioinformatics. 2011 Dec 15;27(24):3399-406. doi: 10.1093/bioinformatics/btr591.

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

Super learner.超级学习者。

Stat Appl Genet Mol Biol. 2007;6:Article25. doi: 10.2202/1544-6115.1309. Epub 2007 Sep 16.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.利用基因表达谱和人工神经网络进行癌症的分类与诊断预测。

Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类：通过基因表达监测进行类别发现和类别预测。

Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.通过寡核苷酸阵列探测的肿瘤和正常结肠组织的聚类分析所揭示的基因表达广泛模式。

Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50. doi: 10.1073/pnas.96.12.6745.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于堆叠弹性网络的预测和可解释模型。

Predictive and interpretable models via the stacked elastic net.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献