School of Mathematics Sciences, University of Chinese Academy of Sciences, Beijing, China.
Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China.
Biometrics. 2022 Jun;78(2):524-535. doi: 10.1111/biom.13426. Epub 2021 Feb 5.
Heterogeneity is a hallmark of cancer, diabetes, cardiovascular diseases, and many other complex diseases. This study has been partly motivated by the unsupervised heterogeneity analysis for complex diseases based on molecular and imaging data, for which, network-based analysis, by accommodating the interconnections among variables, can be more informative than that limited to mean, variance, and other simple distributional properties. In the literature, there has been very limited research on network-based heterogeneity analysis, and a common limitation shared by the existing techniques is that the number of subgroups needs to be specified a priori or in an ad hoc manner. In this article, we develop a penalized fusion approach for heterogeneity analysis based on the Gaussian graphical model. It applies penalization to the mean and precision matrix parameters to generate regularized and interpretable estimates. More importantly, a fusion penalty is imposed to "automatedly" determine the number of subgroups and generate more concise, reliable, and interpretable estimation. Consistency properties are rigorously established, and an effective computational algorithm is developed. The heterogeneity analysis of non-small-cell lung cancer based on single-cell gene expression data of the Wnt pathway and that of lung adenocarcinoma based on histopathological imaging data not only demonstrate the practical applicability of the proposed approach but also lead to interesting new findings.
异质性是癌症、糖尿病、心血管疾病和许多其他复杂疾病的标志。本研究的部分动机是基于分子和成像数据对复杂疾病进行无监督的异质性分析,基于网络的分析通过容纳变量之间的相互连接,可以比仅限于均值、方差和其他简单分布特性的分析更具信息量。在文献中,基于网络的异质性分析的研究非常有限,现有技术的一个共同局限性是需要事先或以特定方式指定子组的数量。在本文中,我们开发了一种基于高斯图形模型的基于惩罚的融合方法用于异质性分析。它对均值和精度矩阵参数施加惩罚,以生成正则化和可解释的估计。更重要的是,施加融合惩罚以“自动”确定子组的数量,并生成更简洁、可靠和可解释的估计。严格建立了一致性性质,并开发了一种有效的计算算法。基于 Wnt 通路的非小细胞肺癌的单细胞基因表达数据和基于组织病理学成像数据的肺腺癌的异质性分析不仅证明了所提出方法的实际适用性,而且还得出了有趣的新发现。