Suppr超能文献

一种稳健的重排序方法用于特征选择及其在基于池的 GWAS 研究中的应用。

A robust rerank approach for feature selection and its application to pooling-based GWA studies.

机构信息

Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan.

出版信息

Comput Math Methods Med. 2013;2013:860673. doi: 10.1155/2013/860673. Epub 2013 Apr 4.

Abstract

Large-p-small-n datasets are commonly encountered in modern biomedical studies. To detect the difference between two groups, conventional methods would fail to apply due to the instability in estimating variances in t-test and a high proportion of tied values in AUC (area under the receiver operating characteristic curve) estimates. The significance analysis of microarrays (SAM) may also not be satisfactory, since its performance is sensitive to the tuning parameter, and its selection is not straightforward. In this work, we propose a robust rerank approach to overcome the above-mentioned diffculties. In particular, we obtain a rank-based statistic for each feature based on the concept of "rank-over-variable." Techniques of "random subset" and "rerank" are then iteratively applied to rank features, and the leading features will be selected for further studies. The proposed re-rank approach is especially applicable for large-p-small-n datasets. Moreover, it is insensitive to the selection of tuning parameters, which is an appealing property for practical implementation. Simulation studies and real data analysis of pooling-based genome wide association (GWA) studies demonstrate the usefulness of our method.

摘要

在现代生物医学研究中,经常会遇到大 p-小 n 数据集。由于传统方法在 t 检验中估计方差不稳定,AUC(接收者操作特征曲线下的面积)估计中存在大量的 tied 值,因此无法应用于检测两组之间的差异。基因芯片的显著性分析(SAM)也可能不尽如人意,因为它的性能对调谐参数很敏感,并且其选择并不直接。在这项工作中,我们提出了一种稳健的重排方法来克服上述困难。具体来说,我们基于“变量之上的秩”的概念,为每个特征获得一个基于秩的统计量。然后,使用“随机子集”和“重排”技术迭代地对特征进行重排,选择主要特征进行进一步研究。所提出的重排方法特别适用于大 p-小 n 数据集。此外,它对调谐参数的选择不敏感,这是实际实施的一个吸引人的特性。基于池的全基因组关联(GWA)研究的模拟研究和实际数据分析证明了我们方法的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dda/3638651/5da808da347b/CMMM2013-860673.001.jpg

相似文献

本文引用的文献

4
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.
8
Haploview: analysis and visualization of LD and haplotype maps.Haploview:连锁不平衡(LD)和单倍型图谱的分析与可视化
Bioinformatics. 2005 Jan 15;21(2):263-5. doi: 10.1093/bioinformatics/bth457. Epub 2004 Aug 5.
9
Significance analysis of microarrays applied to the ionizing radiation response.应用于电离辐射反应的微阵列显著性分析。
Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21. doi: 10.1073/pnas.091062498. Epub 2001 Apr 17.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验