Suppr超能文献

高维数据的联合自适应均值 - 方差正则化与方差稳定化

Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

作者信息

Dazard Jean-Eudes, Rao J Sunil

机构信息

Division of Bioinformatics, Center for Proteomics and Bioinformatics, Case Western Reserve University. Cleveland, OH 44106, USA.

出版信息

Comput Stat Data Anal. 2012 Jul 1;56(7):2317-2333. doi: 10.1016/j.csda.2012.01.012.

Abstract

The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

摘要

本文探讨了高维高通量“组学”数据分析中的一个常见问题,即在一组数据中,变量数量远大于样本量时,对多个变量进行参数估计。这类数据带来的问题包括:由于自由度不足,特定变量的方差估计量不可靠,且逐变量检验统计量的功效较低。此外,在这类数据中还观察到方差随均值的变化而增加。我们引入了一种非参数自适应正则化方法,其创新之处在于:(i)它采用了一种基于新型“相似性统计量”的聚类技术,以生成总体参数的局部合并或正则化收缩估计量;(ii)正则化是在总体矩上联合进行的,受益于C. Stein关于不可容许性的结果,这意味着使用样本均值中包含的信息,通过收缩估计量可以改进常用的样本方差估计量。从这些联合正则化收缩估计量中,我们推导出正则化的类似t统计量,并在模拟研究中表明,与标准样本对应物、正则化公共值收缩估计量或简单忽略样本均值中包含的信息时相比,它们在假设检验中具有更强的统计功效。最后,我们表明这些估计量具有有趣的方差稳定化和归一化特性,可用于高维多元数据的预处理。该方法作为一个名为“MVR”(“均值 - 方差正则化”)的R包提供,可从CRAN网站下载。

相似文献

7
Variance estimation in the analysis of microarray data.微阵列数据分析中的方差估计。
J R Stat Soc Series B Stat Methodol. 2009 Apr 1;71(2):425-445. doi: 10.1111/j.1467-9868.2008.00690.x.

引用本文的文献

本文引用的文献

2
Variance estimation in the analysis of microarray data.微阵列数据分析中的方差估计。
J R Stat Soc Series B Stat Methodol. 2009 Apr 1;71(2):425-445. doi: 10.1111/j.1467-9868.2008.00690.x.
3
5
CART variance stabilization and regularization for high-throughput genomic data.用于高通量基因组数据的CART方差稳定化与正则化
Bioinformatics. 2006 Sep 15;22(18):2254-61. doi: 10.1093/bioinformatics/btl384. Epub 2006 Jul 14.
7
TileMap: create chromosomal map of tiling array hybridizations.TileMap:创建平铺阵列杂交的染色体图谱。
Bioinformatics. 2005 Sep 15;21(18):3629-36. doi: 10.1093/bioinformatics/bti593. Epub 2005 Jul 26.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验