School of Mathematical Sciences, and Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Science, Beijing, China.
Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China.
Stat Med. 2021 Jul 30;40(17):3915-3936. doi: 10.1002/sim.9006. Epub 2021 Apr 27.
Heterogeneity is a hallmark of many complex diseases. There are multiple ways of defining heterogeneity, among which the heterogeneity in genetic regulations, for example, gene expressions (GEs) by copy number variations (CNVs), and methylation, has been suggested but little investigated. Heterogeneity in genetic regulations can be linked with disease severity, progression, and other traits and is biologically important. However, the analysis can be very challenging with the high dimensionality of both sides of regulation as well as sparse and weak signals. In this article, we consider the scenario where subjects form unknown subgroups, and each subgroup has unique genetic regulation relationships. Further, such heterogeneity is "guided" by a known biomarker. We develop a multivariate sparse fusion (MSF) approach, which innovatively applies the penalized fusion technique to simultaneously determine the number and structure of subgroups and regulation relationships within each subgroup. An effective computational algorithm is developed, and extensive simulations are conducted. The analysis of heterogeneity in the GE-CNV regulations in melanoma and GE-methylation regulations in stomach cancer using the TCGA data leads to interesting findings.
异质性是许多复杂疾病的标志。有多种定义异质性的方法,其中遗传调控的异质性,例如通过拷贝数变异(CNVs)和甲基化的基因表达(GEs)已经被提出但研究甚少。遗传调控的异质性可以与疾病的严重程度、进展和其他特征相关联,并且具有重要的生物学意义。然而,由于调控的双方具有高维度以及稀疏和微弱的信号,因此分析可能非常具有挑战性。在本文中,我们考虑了这样一种情况,即受试者形成未知的亚组,并且每个亚组都具有独特的遗传调控关系。此外,这种异质性是由一个已知的生物标志物“引导”的。我们开发了一种多元稀疏融合(MSF)方法,该方法创新性地应用了惩罚融合技术,以同时确定每个亚组内的亚组数量和结构以及调控关系。开发了一种有效的计算算法,并进行了广泛的模拟。使用 TCGA 数据对黑色素瘤的 GE-CNV 调控异质性和胃癌的 GE-甲基化调控异质性进行分析,得出了有趣的发现。