大协方差矩阵估计中的稀疏性与收敛速率

Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.

作者信息

Lam Clifford, Fan Jianqing

机构信息

Department of Statistics, London School of Economics and Political Science, London, WC2A 2AE (

出版信息

Ann Stat. 2009;37(6B):4254-4278. doi: 10.1214/09-AOS720.

DOI:10.1214/09-AOS720

PMID:21132082

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2995610/

Abstract

This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.

摘要

本文研究了基于具有非凸惩罚函数的惩罚似然估计稀疏协方差矩阵和精度矩阵时的稀疏一致性和收敛速度。这里，稀疏一致性指的是所有为零的参数实际上以趋于1的概率被估计为零的性质。根据应用情况，稀疏先验可能出现在协方差矩阵、其逆矩阵或其Cholesky分解上。我们在一个具有一般惩罚函数的统一框架下研究这三个稀疏探索问题。我们表明，在Frobenius范数下这些问题的收敛速度为(s(n) log p(n)/n)^(1/2)阶，其中s(n)是非零元素的数量，p(n)是协方差矩阵的大小，n是样本量。这明确说明了高维性的贡献仅仅是一个对数因子。已经明确给出了调整参数λ(n)趋于0的速度条件，并在不同惩罚下进行了比较。结果表明，对于L(1)惩罚，为了保证稀疏一致性和最优收敛速度，在估计稀疏协方差或相关矩阵、稀疏精度或逆相关矩阵或稀疏Cholesky因子时，在O(pn^2)个参数中，非零元素的数量应该最多为sn' = O(pn)，其中sn'是对角线外元素上非零元素的数量。另一方面，使用SCAD或硬阈值惩罚函数则没有这样的限制。

相似文献

Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.大协方差矩阵估计中的稀疏性与收敛速率

Ann Stat. 2009;37(6B):4254-4278. doi: 10.1214/09-AOS720.

Sparse estimation of a covariance matrix.协方差矩阵的稀疏估计。

Biometrika. 2011 Dec;98(4):807-820. doi: 10.1093/biomet/asr054.

A proximal distance algorithm for likelihood-based sparse covariance estimation.一种基于似然性的稀疏协方差估计的近端距离算法。

Biometrika. 2022 Dec;109(4):1047-1066. doi: 10.1093/biomet/asac011. Epub 2022 Feb 16.

Large Covariance Estimation by Thresholding Principal Orthogonal Complements.通过阈值化主正交补进行大协方差估计

J R Stat Soc Series B Stat Methodol. 2013 Sep 1;75(4). doi: 10.1111/rssb.12016.

Optimal Estimation and Rank Detection for Sparse Spiked Covariance Matrices.稀疏尖峰协方差矩阵的最优估计与秩检测

Probab Theory Relat Fields. 2015 Apr 1;161(3-4):781-815. doi: 10.1007/s00440-014-0562-z.

A Cholesky-based sparse covariance estimation with an application to genes data.基于 Cholesky 的稀疏协方差估计及其在基因数据中的应用。

J Biopharm Stat. 2021 Sep 3;31(5):603-616. doi: 10.1080/10543406.2021.1931270. Epub 2021 May 29.

L0-regularized time-varying sparse inverse covariance estimation for tracking dynamic fMRI brain networks.用于跟踪动态功能磁共振成像脑网络的 L0 正则化时变稀疏逆协方差估计

Annu Int Conf IEEE Eng Med Biol Soc. 2015 Aug;2015:1496-9. doi: 10.1109/EMBC.2015.7318654.

Performance of penalized maximum likelihood in estimation of genetic covariances matrices.惩罚最大似然估计在遗传协方差矩阵估计中的性能。

Genet Sel Evol. 2011 Nov 27;43(1):39. doi: 10.1186/1297-9686-43-39.

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data.具有不完全数据的高维协方差矩阵的极小极大速率最优估计

J Multivar Anal. 2016 Sep;150:55-74. doi: 10.1016/j.jmva.2016.05.002. Epub 2016 May 19.

Joint Estimation of Precision Matrices in Heterogeneous Populations.异质群体中精度矩阵的联合估计

Electron J Stat. 2016;10(1):1341-1392. doi: 10.1214/16-EJS1137. Epub 2016 May 31.

引用本文的文献

Large Precision Matrix Estimation with Unknown Group Structure.具有未知组结构的大型精度矩阵估计

J Am Stat Assoc. 2025 Feb 10. doi: 10.1080/01621459.2024.2442092.

Inferring independent sets of Gaussian variables after thresholding correlations.在对相关性进行阈值处理后推断高斯变量的独立集。

J Am Stat Assoc. 2025;120(549):370-381. doi: 10.1080/01621459.2024.2337158. Epub 2024 May 20.

A framework for analyzing EEG data using high-dimensional tests.一种使用高维测试分析脑电图（EEG）数据的框架。

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf109.

A proximal distance algorithm for likelihood-based sparse covariance estimation.一种基于似然性的稀疏协方差估计的近端距离算法。

Biometrika. 2022 Dec;109(4):1047-1066. doi: 10.1093/biomet/asac011. Epub 2022 Feb 16.

An Expectation-Maximization Algorithm for Combining a Sample of Partially Overlapping Covariance Matrices.一种用于合并部分重叠协方差矩阵样本的期望最大化算法。

Axioms. 2023 Feb;12(2). doi: 10.3390/axioms12020161. Epub 2023 Feb 4.

Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions.高维中基于交叉验证损失的协方差矩阵估计器选择

J Comput Graph Stat. 2023;32(2):601-612. doi: 10.1080/10618600.2022.2110883. Epub 2022 Oct 7.

CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets.CDPA：高维数据集之间的共性与差异模式分析

Electron J Stat. 2022;16(1):2475-2517. doi: 10.1214/22-EJS2008. Epub 2022 Apr 4.

Covariance estimation via fiducial inference.基于置信推断的协方差估计。

Stat Theory Relat Fields. 2021;5(4):316-331. doi: 10.1080/24754269.2021.1877950. Epub 2021 Feb 15.

Linear and nonlinear correlation estimators unveil undescribed taxa interactions in microbiome data.线性和非线性相关估计揭示了微生物组数据中未被描述的分类群相互作用。

Nat Commun. 2022 Aug 23;13(1):4946. doi: 10.1038/s41467-022-32243-x.

DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING.双重去偏套索法：隐藏混杂因素下的高维推断

Ann Stat. 2022 Jun;50(3):1320-1347. doi: 10.1214/21-aos2152. Epub 2022 Jun 16.

本文引用的文献

NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES.基于自适应LASSO和SCAD惩罚的网络探索

Ann Appl Stat. 2009 Jun 1;3(2):521-541. doi: 10.1214/08-AOAS215SUPP.

One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.非凹惩罚似然模型中的一步稀疏估计

Ann Stat. 2008 Aug 1;36(4):1509-1533. doi: 10.1214/009053607000000802.

Sparse inverse covariance estimation with the graphical lasso.使用图模型选择法进行稀疏逆协方差估计。

Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.

Nonparametric estimation of covariance structure in longitudinal data.

Biometrics. 1998 Jun;54(2):401-15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验