Qin Xing, Hu Jianhua, Ma Shuangge, Wu Mengyun
School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China.
School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.
J Multivar Anal. 2024 Jul;202. doi: 10.1016/j.jmva.2024.105298. Epub 2024 Feb 13.
Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.
网络估计一直是高维数据分析的关键组成部分,并且能够提供对潜在复杂依赖结构的理解。在现有研究中,高斯图形模型非常流行。然而,由于其均匀分布假设以及仅适用于小规模数据的事实,它们仍然存在局限性。例如,癌症具有不同程度的未知异质性,而包含数千个分子成分的生物网络在不同亚组之间往往存在差异,同时也有一些共性。在本文中,我们通过将高斯图形模型(GGM)分解为一系列稀疏回归问题,提出了一种针对具有未知样本异质性的多个网络的新联合估计方法。引入了一种重新参数化技术和一种复合极小极大凹惩罚,以有效容纳多个亚组网络之间的特定和共同信息,使得所提出的估计器相对于基于GGM正则化似然的现有异质性网络分析有显著进步,并具有尺度不变、调优不敏感和优化凸性等性质。所提出的分析可以通过并行计算有效地实现。严格建立了估计和选择一致性性质。所提出的方法使理论研究仅关注独立网络估计,并且在理论和计算上都适用于大规模数据方面具有显著优势。对模拟数据和TCGA乳腺癌数据进行的大量数值实验证明了所提出方法在亚组和网络识别方面的卓越性能。