Suppr超能文献

选择提升:一种增强变量选择方法性能的通用算法。

selectBoost: a general algorithm to enhance the performance of variable selection methods.

机构信息

Institut de Recherche Mathématique Avancée, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France.

Université de Technologie de Troyes, ICD, ROSAS, M2S, Troyes, France.

出版信息

Bioinformatics. 2021 May 5;37(5):659-668. doi: 10.1093/bioinformatics/btaa855.

Abstract

MOTIVATION

With the growth of big data, variable selection has become one of the critical challenges in statistics. Although many methods have been proposed in the literature, their performance in terms of recall (sensitivity) and precision (predictive positive value) is limited in a context where the number of variables by far exceeds the number of observations or in a highly correlated setting.

RESULTS

In this article, we propose a general algorithm, which improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. Our algorithm can either produce a confidence index for variable selection or be used in an experimental design planning perspective. We demonstrate the performance of our algorithm on both simulated and real data. We then apply it in two different ways to improve biological network reverse-engineering.

AVAILABILITY AND IMPLEMENTATION

Code is available as the SelectBoost package on the CRAN, https://cran.r-project.org/package=SelectBoost. Some network reverse-engineering functionalities are available in the Patterns CRAN package, https://cran.r-project.org/package=Patterns.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着大数据的增长,变量选择已成为统计学中的关键挑战之一。尽管文献中已经提出了许多方法,但在变量数量远远超过观测值的情况下,或者在高度相关的情况下,它们在召回率(灵敏度)和精度(预测阳性值)方面的性能受到限制。

结果

在本文中,我们提出了一种通用算法,该算法可提高任何现有变量选择方法的精度。该算法基于高度密集的模拟,并考虑了数据的相关结构。我们的算法可以为变量选择生成置信指数,也可以用于实验设计规划的角度。我们在模拟和真实数据上展示了我们算法的性能。然后,我们以两种不同的方式将其应用于改进生物网络反向工程。

可用性和实现

代码作为 SelectBoost 包在 CRAN 上可用,https://cran.r-project.org/package=SelectBoost。Patterns CRAN 包中提供了一些网络反向工程功能,https://cran.r-project.org/package=Patterns。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ae5/8097688/bf662d09e6a2/btaa855f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验