Suppr超能文献

ecpc:用于高维预测的通用协数据模型的 R 包。

ecpc: an R-package for generic co-data models for high-dimensional prediction.

机构信息

Epidemiology & Data Science, Amsterdam Public Health research institute, Amsterdam University Medical Centers, Amsterdam, The Netherlands.

Molecular Carcinogenesis, Netherlands Cancer Institute, Amsterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2023 Apr 26;24(1):172. doi: 10.1186/s12859-023-05289-x.

Abstract

BACKGROUND

High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables. We consider adaptive ridge penalised generalised linear and Cox models, in which the variable-specific ridge penalties are adapted to the co-data to give a priori more weight to more important variables. The R-package ecpc originally accommodated various and possibly multiple co-data sources, including categorical co-data, i.e. groups of variables, and continuous co-data. Continuous co-data, however, were handled by adaptive discretisation, potentially inefficiently modelling and losing information. As continuous co-data such as external p values or correlations often arise in practice, more generic co-data models are needed.

RESULTS

Here, we present an extension to the method and software for generic co-data models, particularly for continuous co-data. At the basis lies a classical linear regression model, regressing prior variance weights on the co-data. Co-data variables are then estimated with empirical Bayes moment estimation. After placing the estimation procedure in the classical regression framework, extension to generalised additive and shape constrained co-data models is straightforward. Besides, we show how ridge penalties may be transformed to elastic net penalties. In simulation studies we first compare various co-data models for continuous co-data from the extension to the original method. Secondly, we compare variable selection performance to other variable selection methods. The extension is faster than the original method and shows improved prediction and variable selection performance for non-linear co-data relations. Moreover, we demonstrate use of the package in several genomics examples throughout the paper.

CONCLUSIONS

The R-package ecpc accommodates linear, generalised additive and shape constrained additive co-data models for the purpose of improved high-dimensional prediction and variable selection. The extended version of the package as presented here (version number 3.1.1 and higher) is available on ( https://cran.r-project.org/web/packages/ecpc/ ).

摘要

背景

高维预测考虑了具有更多变量的数据,而不是样本。通用研究目标是找到最佳预测器或选择变量。通过利用协数据的先验信息,即提供不在样本上但在变量上的补充数据,可以改进结果。我们考虑自适应岭惩罚广义线性和 Cox 模型,其中变量特定的岭惩罚根据协数据进行自适应调整,从而为更重要的变量赋予先验更高的权重。R 包 ecpc 最初适应各种(可能是多个)协数据源,包括分类协数据,即变量组,以及连续协数据。然而,连续协数据通过自适应离散化进行处理,可能会低效地建模并丢失信息。由于实践中经常出现连续协数据,例如外部 p 值或相关性,因此需要更通用的协数据模型。

结果

在这里,我们提出了一种对通用协数据模型的方法和软件的扩展,特别是针对连续协数据。它基于一个经典的线性回归模型,将先验方差权重回归到协数据上。然后,使用经验贝叶斯矩估计对协数据变量进行估计。在将估计过程置于经典回归框架之后,很容易扩展到广义可加和形状约束的协数据模型。此外,我们展示了如何将岭惩罚转换为弹性网络惩罚。在模拟研究中,我们首先比较了从原始方法扩展到扩展的连续协数据的各种协数据模型。其次,我们将变量选择性能与其他变量选择方法进行了比较。扩展版本比原始方法更快,并且对非线性协数据关系显示出改进的预测和变量选择性能。此外,我们在整篇论文中展示了该软件包在几个基因组学示例中的应用。

结论

R 包 ecpc 适用于线性、广义可加和形状约束的协数据模型,用于改进高维预测和变量选择。本文中介绍的扩展版本(版本号 3.1.1 及更高版本)可在(https://cran.r-project.org/web/packages/ecpc/)上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4964/10134536/b7dc5312d8e4/12859_2023_5289_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验