• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

可转置数据的推断:对行和列相关性的影响进行建模。

Inference with Transposable Data: Modeling the Effects of Row and Column Correlations.

作者信息

Allen Genevera I, Tibshirani Robert

机构信息

Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, & Department of Statistics, Rice University, Houston, TX, 77005.

Departments of Health Research & Policy and Statistics, Stanford University, Stanford, CA, 94305.

出版信息

J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):721-743. doi: 10.1111/j.1467-9868.2011.01027.x. Epub 2012 Mar 16.

DOI:10.1111/j.1467-9868.2011.01027.x
PMID:34880705
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8649963/
Abstract

We consider the problem of large-scale inference on the row or column variables of data in the form of a matrix. Many of these data matrices are meaning that neither the row variables nor the column variables can be considered independent instances. An example of this scenario is detecting significant genes in microarrays when the samples may be dependent due to latent variables or unknown batch effects. By modeling this matrix data using the matrix-variate normal distribution, we study and quantify the effects of row and column correlations on procedures for large-scale inference. We then propose a simple solution to the myriad of problems presented by unanticipated correlations: We simultaneously estimate row and column covariances and use these to sphere or de-correlate the noise in the underlying data before conducting inference. This procedure yields data with approximately independent rows and columns so that test statistics more closely follow null distributions and multiple testing procedures correctly control the desired error rates. Results on simulated models and real microarray data demonstrate major advantages of this approach: (1) increased statistical power, (2) less bias in estimating the false discovery rate, and (3) reduced variance of the false discovery rate estimators.

摘要

我们考虑对矩阵形式数据的行变量或列变量进行大规模推断的问题。这些数据矩阵中有许多意味着行变量和列变量都不能被视为独立实例。这种情况的一个例子是在微阵列中检测显著基因,此时样本可能由于潜在变量或未知批次效应而相关。通过使用矩阵变量正态分布对这种矩阵数据进行建模,我们研究并量化了行和列相关性对大规模推断程序的影响。然后,我们针对意外相关性带来的众多问题提出了一个简单的解决方案:我们同时估计行和列协方差,并在进行推断之前使用这些协方差对基础数据中的噪声进行球化或去相关处理。此过程产生具有近似独立行和列的数据,以便检验统计量更紧密地遵循零分布,并且多重检验程序能够正确控制所需的错误率。在模拟模型和真实微阵列数据上的结果证明了这种方法的主要优点:(1)提高统计功效,(2)在估计错误发现率时偏差更小,以及(3)降低错误发现率估计器的方差。

相似文献

1
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations.可转置数据的推断:对行和列相关性的影响进行建模。
J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):721-743. doi: 10.1111/j.1467-9868.2011.01027.x. Epub 2012 Mar 16.
2
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.具有缺失数据插补应用的可转置正则化协方差模型。
Ann Appl Stat. 2010 Jun;4(2):764-790. doi: 10.1214/09-AOAS314.
3
Testing the mean matrix in high-dimensional transposable data.在高维转座数据中测试均值矩阵。
Biometrics. 2015 Mar;71(1):157-166. doi: 10.1111/biom.12257. Epub 2015 Jan 23.
4
A Penalized Matrix Normal Mixture Model for Clustering Matrix Data.一种用于矩阵数据聚类的惩罚矩阵正态混合模型。
Entropy (Basel). 2021 Sep 26;23(10):1249. doi: 10.3390/e23101249.
5
Empirical Bayes method for reducing false discovery rates of correlation matrices with block diagonal structure.用于降低具有块对角结构的相关矩阵错误发现率的经验贝叶斯方法。
BMC Bioinformatics. 2017 Apr 12;18(1):213. doi: 10.1186/s12859-017-1623-y.
6
Estimating the number of usability problems affecting medical devices: modelling the discovery matrix.估算影响医疗器械的可用性问题的数量:建模发现矩阵。
BMC Med Res Methodol. 2020 Sep 18;20(1):234. doi: 10.1186/s12874-020-01091-y.
7
GENERALIZED MATRIX DECOMPOSITION REGRESSION: ESTIMATION AND INFERENCE FOR TWO-WAY STRUCTURED DATA.广义矩阵分解回归:双向结构化数据的估计与推断
Ann Appl Stat. 2023 Dec;17(4):2944-2969. doi: 10.1214/23-aoas1746. Epub 2023 Oct 30.
8
Are a set of microarrays independent of each other?一组微阵列彼此独立吗?
Ann Appl Stat. 2009 Jan 1;3(3):922-942. doi: 10.1214/09-AOAS236.
9
Estimating the null distribution to adjust observed confidence levels for genome-scale screening.估计无效分布以调整基因组规模筛选的观察到的置信水平。
Biometrics. 2011 Jun;67(2):363-70. doi: 10.1111/j.1541-0420.2010.01491.x. Epub 2010 Sep 28.
10
Null model analysis of species associations using abundance data.基于丰度数据的物种关联的零模型分析。
Ecology. 2010 Nov;91(11):3384-97. doi: 10.1890/09-2157.1.

引用本文的文献

1
Permutation based testing on covariance separability.基于排列的协方差可分性检验。
Comput Stat. 2019 Jun 1;34(2):865-883. doi: 10.1007/s00180-018-0839-2. Epub 2018 Sep 27.
2
Blind normalization of public high-throughput databases.公共高通量数据库的盲目标准化。
PeerJ Comput Sci. 2019 Nov 11;5:e231. doi: 10.7717/peerj-cs.231. eCollection 2019.
3
Mixed Effects Models for Resampled Network Statistics Improves Statistical Power to Find Differences in Multi-Subject Functional Connectivity.重采样网络统计的混合效应模型提高了发现多主体功能连接差异的统计功效。

本文引用的文献

1
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.具有缺失数据插补应用的可转置正则化协方差模型。
Ann Appl Stat. 2010 Jun;4(2):764-790. doi: 10.1214/09-AOAS314.
2
The effect of correlation in false discovery rate estimation.相关性在错误发现率估计中的作用。
Biometrika. 2011 Mar;98(1):199-214. doi: 10.1093/biomet/asq075.
3
Correlated z-values and the accuracy of large-scale statistical estimates.相关z值与大规模统计估计的准确性。
Front Neurosci. 2016 Apr 12;10:108. doi: 10.3389/fnins.2016.00108. eCollection 2016.
4
A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies.一种用于高维全基因组关联研究中同时 SNP 选择的新算法。
BMC Bioinformatics. 2012 Oct 31;13:284. doi: 10.1186/1471-2105-13-284.
5
Model Selection and Estimation in the Matrix Normal Graphical Model.矩阵正态图形模型中的模型选择与估计
J Multivar Anal. 2012 May 1;107:119-140. doi: 10.1016/j.jmva.2012.01.005.
J Am Stat Assoc. 2010 Sep 1;105(491):1042-1055. doi: 10.1198/jasa.2010.tm09129.
4
Tackling the widespread and critical impact of batch effects in high-throughput data.解决高通量数据中广泛存在且极具影响力的批次效应问题。
Nat Rev Genet. 2010 Oct;11(10):733-9. doi: 10.1038/nrg2825. Epub 2010 Sep 14.
5
On Consistency and Sparsity for Principal Components Analysis in High Dimensions.高维主成分分析中的一致性与稀疏性
J Am Stat Assoc. 2009 Jun 1;104(486):682-693. doi: 10.1198/jasa.2009.0121.
6
Are a set of microarrays independent of each other?一组微阵列彼此独立吗?
Ann Appl Stat. 2009 Jan 1;3(3):922-942. doi: 10.1214/09-AOAS236.
7
SUCCESSIVE NORMALIZATION OF RECTANGULAR ARRAYS.矩形阵列的逐次归一化
Ann Stat. 2010 Jun 1;38(3):1638-1664. doi: 10.1214/09-AOS743.
8
Gene ranking and biomarker discovery under correlation.基于相关性的基因排序和生物标志物发现。
Bioinformatics. 2009 Oct 15;25(20):2700-7. doi: 10.1093/bioinformatics/btp460. Epub 2009 Jul 30.
9
A general framework for multiple testing dependence.多重检验相关性的通用框架。
Proc Natl Acad Sci U S A. 2008 Dec 2;105(48):18718-23. doi: 10.1073/pnas.0808709105. Epub 2008 Nov 24.
10
Sparse inverse covariance estimation with the graphical lasso.使用图模型选择法进行稀疏逆协方差估计。
Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.