Suppr超能文献

使用短排列程序和多重调整进行变量的顺序选择:在基因组数据中的应用。

Sequential selection of variables using short permutation procedures and multiple adjustments: An application to genomic data.

作者信息

Azevedo Costa Marcelo, de Souza Rodrigues Thiago, da Costa André Gabriel Fc, Natowicz René, Pádua Braga Antônio

机构信息

1 Department of Industrial Engineering, Universidade Federal de Minas Gerais, Belo Horizonte,Brazil.

2 Computer Department, Centro Federal de Educação Tecnológica Minas Gerais, Brazil.

出版信息

Stat Methods Med Res. 2017 Apr;26(2):997-1020. doi: 10.1177/0962280214566262. Epub 2015 Jan 9.

Abstract

This work proposes a sequential methodology for selecting variables in classification problems in which the number of predictors is much larger than the sample size. The methodology includes a Monte Carlo permutation procedure that conditionally tests the null hypothesis of no association among the outcomes and the available predictors. In order to improve computing aspects, we propose a new parametric distribution, the Truncated and Zero Inflated Gumbel Distribution. The final application is to find compact classification models with improved performance for genomic data. Results using real data sets show that the proposed methodology selects compact models with optimized classification performances.

摘要

这项工作提出了一种用于在预测变量数量远大于样本量的分类问题中选择变量的序贯方法。该方法包括一个蒙特卡洛排列程序,用于有条件地检验结果与可用预测变量之间无关联的原假设。为了改进计算方面,我们提出了一种新的参数分布,即截断零膨胀耿贝尔分布。最终应用是为基因组数据找到具有改进性能的紧凑分类模型。使用真实数据集的结果表明,所提出的方法选择了具有优化分类性能的紧凑模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验