Suppr超能文献

具有缺失数据插补应用的可转置正则化协方差模型。

TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.

作者信息

Allen Genevera I, Tibshirani Robert

机构信息

Department of Statistics, Stanford University, Stanford, California, 94305, USA,

出版信息

Ann Appl Stat. 2010 Jun;4(2):764-790. doi: 10.1214/09-AOAS314.

Abstract

Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

摘要

缺失数据估计是矩阵形式的高维数据面临的一项重要挑战。通常,这个数据矩阵是 ,这意味着行、列或两者都可被视为特征。为了对可转置数据进行建模,我们提出了矩阵变量正态分布的一种变体,即 ,其中行和列分别有单独的均值向量和协方差矩阵。通过对行和列的逆协方差矩阵施加附加惩罚,这些所谓的可转置正则化协方差模型允许对均值和非奇异协方差矩阵进行最大似然估计。利用这些模型,我们在多元和可转置框架中为缺失数据插补制定了EM型算法。我们给出了利用可转置模型结构的理论结果,这些结果使得这些模型和插补方法能够应用于高维数据。对微阵列数据和Netflix数据的模拟及结果表明,这些插补技术通常优于现有方法,并提供了更大程度的灵活性。

相似文献

1
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.
Ann Appl Stat. 2010 Jun;4(2):764-790. doi: 10.1214/09-AOAS314.
2
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations.
J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):721-743. doi: 10.1111/j.1467-9868.2011.01027.x. Epub 2012 Mar 16.
3
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
4
Covariance-regularized regression and classification for high-dimensional problems.
J R Stat Soc Series B Stat Methodol. 2009 Feb 20;71(3):615-636. doi: 10.1111/j.1467-9868.2009.00699.x.
6
Empirical Bayes Linked Matrix Decomposition.
Mach Learn. 2024 Oct;113(10):7451-7477. doi: 10.1007/s10994-024-06599-8. Epub 2024 Aug 7.
7
Optimal variable clustering for high-dimensional matrix valued data.
Inf inference. 2025 Mar 12;14(1):iaaf001. doi: 10.1093/imaiai/iaaf001. eCollection 2025 Mar.
8
Performance of penalized maximum likelihood in estimation of genetic covariances matrices.
Genet Sel Evol. 2011 Nov 27;43(1):39. doi: 10.1186/1297-9686-43-39.
9
Multiple imputation with sequential penalized regression.
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
10
Multi-response Regression for Block-missing Multi-modal Data without Imputation.
Stat Sin. 2024 Apr;34(2):527-546. doi: 10.5705/ss.202021.0170.

引用本文的文献

1
Data integration in Bayesian phylogenetics.
Annu Rev Stat Appl. 2023;10:353-377. doi: 10.1146/annurev-statistics-033021-112532. Epub 2022 Sep 28.
2
UNIFYING AND GENERALIZING METHODS FOR REMOVING UNWANTED VARIATION BASED ON NEGATIVE CONTROLS.
Stat Sin. 2021 Jul;31(3):1145-1166. doi: 10.5705/ss.202018.0345.
3
CO-CLUSTERING OF SPATIALLY RESOLVED TRANSCRIPTOMIC DATA.
Ann Appl Stat. 2023 Jun;17(2):1444-1468. doi: 10.1214/22-aoas1677. Epub 2023 May 1.
4
Inferring Phenotypic Trait Evolution on Large Trees With Many Incomplete Measurements.
J Am Stat Assoc. 2022;117(538):678-692. doi: 10.1080/01621459.2020.1799812. Epub 2020 Sep 16.
5
Inference with Transposable Data: Modeling the Effects of Row and Column Correlations.
J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):721-743. doi: 10.1111/j.1467-9868.2011.01027.x. Epub 2012 Mar 16.
6
Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing.
Front Artif Intell. 2021 Jun 14;4:576892. doi: 10.3389/frai.2021.576892. eCollection 2021.
7
Testing for nodal dependence in relational data matrices.
J Am Stat Assoc. 2015;110(511):1037-1046. doi: 10.1080/01621459.2014.965777. Epub 2015 Nov 7.
8
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.
Proc IEEE Inst Electr Electron Eng. 2016 Jan;104(1):93-110. doi: 10.1109/JPROC.2015.2494178. Epub 2015 Dec 21.
9
A multiple-phenotype imputation method for genetic studies.
Nat Genet. 2016 Apr;48(4):466-72. doi: 10.1038/ng.3513. Epub 2016 Feb 22.
10
SEPARABLE FACTOR ANALYSIS WITH APPLICATIONS TO MORTALITY DATA.
Ann Appl Stat. 2014;8(1):120-147. doi: 10.1214/13-aoas694.

本文引用的文献

1
Correlated z-values and the accuracy of large-scale statistical estimates.
J Am Stat Assoc. 2010 Sep 1;105(491):1042-1055. doi: 10.1198/jasa.2010.tm09129.
2
Covariance-regularized regression and classification for high-dimensional problems.
J R Stat Soc Series B Stat Methodol. 2009 Feb 20;71(3):615-636. doi: 10.1111/j.1467-9868.2009.00699.x.
3
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.
4
Gene expression profiling predicts survival in conventional renal cell carcinoma.
PLoS Med. 2006 Jan;3(1):e13. doi: 10.1371/journal.pmed.0030013. Epub 2005 Dec 6.
5
Missing value estimation for DNA microarray gene expression data: local least squares imputation.
Bioinformatics. 2005 Jan 15;21(2):187-98. doi: 10.1093/bioinformatics/bth499. Epub 2004 Aug 27.
6
Missing value estimation methods for DNA microarrays.
Bioinformatics. 2001 Jun;17(6):520-5. doi: 10.1093/bioinformatics/17.6.520.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验