Suppr超能文献

可转置数据的推断:对行和列相关性的影响进行建模。

Inference with Transposable Data: Modeling the Effects of Row and Column Correlations.

作者信息

Allen Genevera I, Tibshirani Robert

机构信息

Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, & Department of Statistics, Rice University, Houston, TX, 77005.

Departments of Health Research & Policy and Statistics, Stanford University, Stanford, CA, 94305.

出版信息

J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):721-743. doi: 10.1111/j.1467-9868.2011.01027.x. Epub 2012 Mar 16.

Abstract

We consider the problem of large-scale inference on the row or column variables of data in the form of a matrix. Many of these data matrices are meaning that neither the row variables nor the column variables can be considered independent instances. An example of this scenario is detecting significant genes in microarrays when the samples may be dependent due to latent variables or unknown batch effects. By modeling this matrix data using the matrix-variate normal distribution, we study and quantify the effects of row and column correlations on procedures for large-scale inference. We then propose a simple solution to the myriad of problems presented by unanticipated correlations: We simultaneously estimate row and column covariances and use these to sphere or de-correlate the noise in the underlying data before conducting inference. This procedure yields data with approximately independent rows and columns so that test statistics more closely follow null distributions and multiple testing procedures correctly control the desired error rates. Results on simulated models and real microarray data demonstrate major advantages of this approach: (1) increased statistical power, (2) less bias in estimating the false discovery rate, and (3) reduced variance of the false discovery rate estimators.

摘要

我们考虑对矩阵形式数据的行变量或列变量进行大规模推断的问题。这些数据矩阵中有许多意味着行变量和列变量都不能被视为独立实例。这种情况的一个例子是在微阵列中检测显著基因,此时样本可能由于潜在变量或未知批次效应而相关。通过使用矩阵变量正态分布对这种矩阵数据进行建模,我们研究并量化了行和列相关性对大规模推断程序的影响。然后,我们针对意外相关性带来的众多问题提出了一个简单的解决方案:我们同时估计行和列协方差,并在进行推断之前使用这些协方差对基础数据中的噪声进行球化或去相关处理。此过程产生具有近似独立行和列的数据,以便检验统计量更紧密地遵循零分布,并且多重检验程序能够正确控制所需的错误率。在模拟模型和真实微阵列数据上的结果证明了这种方法的主要优点:(1)提高统计功效,(2)在估计错误发现率时偏差更小,以及(3)降低错误发现率估计器的方差。

相似文献

1
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations.可转置数据的推断:对行和列相关性的影响进行建模。
J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):721-743. doi: 10.1111/j.1467-9868.2011.01027.x. Epub 2012 Mar 16.
3
Testing the mean matrix in high-dimensional transposable data.在高维转座数据中测试均值矩阵。
Biometrics. 2015 Mar;71(1):157-166. doi: 10.1111/biom.12257. Epub 2015 Jan 23.
8
Are a set of microarrays independent of each other?一组微阵列彼此独立吗?
Ann Appl Stat. 2009 Jan 1;3(3):922-942. doi: 10.1214/09-AOAS236.

本文引用的文献

6
Are a set of microarrays independent of each other?一组微阵列彼此独立吗?
Ann Appl Stat. 2009 Jan 1;3(3):922-942. doi: 10.1214/09-AOAS236.
7
SUCCESSIVE NORMALIZATION OF RECTANGULAR ARRAYS.矩形阵列的逐次归一化
Ann Stat. 2010 Jun 1;38(3):1638-1664. doi: 10.1214/09-AOS743.
8
Gene ranking and biomarker discovery under correlation.基于相关性的基因排序和生物标志物发现。
Bioinformatics. 2009 Oct 15;25(20):2700-7. doi: 10.1093/bioinformatics/btp460. Epub 2009 Jul 30.
9
A general framework for multiple testing dependence.多重检验相关性的通用框架。
Proc Natl Acad Sci U S A. 2008 Dec 2;105(48):18718-23. doi: 10.1073/pnas.0808709105. Epub 2008 Nov 24.
10
Sparse inverse covariance estimation with the graphical lasso.使用图模型选择法进行稀疏逆协方差估计。
Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验