Suppr超能文献

AUC-RF:一种使用随机森林进行基因组分析的新策略。

AUC-RF: a new strategy for genomic profiling with random forest.

作者信息

Calle M Luz, Urrea Victor, Boulesteix Anne-Laure, Malats Nuria

机构信息

Systems Biology Department, University of Vic, Spain. malu.calle @ uvic.cat

出版信息

Hum Hered. 2011;72(2):121-32. doi: 10.1159/000330778. Epub 2011 Oct 11.

Abstract

OBJECTIVE

Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling.

METHODS

We propose a new algorithm for genomic profiling based on optimizing the area under the receiver operating characteristic curve (AUC) of the random forest (RF). The proposed strategy implements a backward elimination process based on the initial ranking of variables.

RESULTS AND CONCLUSIONS

We demonstrate the advantage of using the AUC instead of the classification error as a measure of predictive accuracy of RF. In particular, we show that the use of the classification error is especially inappropriate when dealing with unbalanced data sets. The new procedure for variable selection and prediction, namely AUC-RF, is illustrated with data from a bladder cancer study and also with simulated data. The algorithm is publicly available as an R package, named AUCRF, at http://cran.r-project.org/.

摘要

目的

基因组分析,即同时利用多个位点的基因变异来预测疾病风险,需要选择一组能最佳预测疾病状态的基因变异。这项工作的目标是提供一种用于基因组分析的新选择算法。

方法

我们基于优化随机森林(RF)的受试者工作特征曲线(AUC)下的面积,提出了一种用于基因组分析的新算法。所提出的策略基于变量的初始排名实施向后消除过程。

结果与结论

我们证明了使用AUC而非分类误差作为RF预测准确性度量的优势。特别是,我们表明在处理不平衡数据集时,使用分类误差尤其不合适。通过膀胱癌研究的数据以及模拟数据说明了用于变量选择和预测的新程序,即AUC-RF。该算法作为一个名为AUCRF的R包在http://cran.r-project.org/上公开可用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验