Suppr超能文献

优化惩罚回归方法在多种基因组数据中的应用。

Optimized application of penalized regression methods to diverse genomic data.

机构信息

Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.

出版信息

Bioinformatics. 2011 Dec 15;27(24):3399-406. doi: 10.1093/bioinformatics/btr591.

Abstract

MOTIVATION

Penalized regression methods have been adopted widely for high-dimensional feature selection and prediction in many bioinformatic and biostatistical contexts. While their theoretical properties are well-understood, specific methodology for their optimal application to genomic data has not been determined.

RESULTS

Through simulation of contrasting scenarios of correlated high-dimensional survival data, we compared the LASSO, Ridge and Elastic Net penalties for prediction and variable selection. We found that a 2D tuning of the Elastic Net penalties was necessary to avoid mimicking the performance of LASSO or Ridge regression. Furthermore, we found that in a simulated scenario favoring the LASSO penalty, a univariate pre-filter made the Elastic Net behave more like Ridge regression, which was detrimental to prediction performance. We demonstrate the real-life application of these methods to predicting the survival of cancer patients from microarray data, and to classification of obese and lean individuals from metagenomic data. Based on these results, we provide an optimized set of guidelines for the application of penalized regression for reproducible class comparison and prediction with genomic data.

AVAILABILITY AND IMPLEMENTATION

A parallelized implementation of the methods presented for regression and for simulation of synthetic data is provided as the pensim R package, available at http://cran.r-project.org/web/packages/pensim/index.html.

CONTACT

chuttenh@hsph.harvard.edu; juris@ai.utoronto.ca

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

惩罚回归方法已被广泛应用于许多生物信息学和生物统计学背景下的高维特征选择和预测。虽然它们的理论性质已经得到很好的理解,但尚未确定其在基因组数据中最佳应用的具体方法。

结果

通过对比相关高维生存数据的模拟情景,我们比较了 LASSO、Ridge 和 Elastic Net 惩罚在预测和变量选择方面的性能。我们发现,需要对 Elastic Net 惩罚进行二维调整,以避免模仿 LASSO 或 Ridge 回归的性能。此外,我们发现,在有利于 LASSO 惩罚的模拟情景中,单变量预筛选使 Elastic Net 更像 Ridge 回归,这对预测性能不利。我们演示了这些方法在从微阵列数据预测癌症患者生存和从宏基因组数据分类肥胖和瘦个体方面的实际应用。基于这些结果,我们提供了一套优化的指南,用于应用惩罚回归进行可重复的分类比较和基因组数据预测。

可用性和实现

我们提供了用于回归和模拟合成数据的方法的并行实现,作为 pensim R 包,可在 http://cran.r-project.org/web/packages/pensim/index.html 上获得。

联系人

chuttenh@hsph.harvard.edujuris@ai.utoronto.ca

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c13/3232376/f6f487680aca/btr591f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验