Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China.
School of Public Health, University of Memphis, Memphis, Tennessee, USA.
J Comput Biol. 2024 Sep;31(9):834-870. doi: 10.1089/cmb.2023.0437. Epub 2024 Aug 12.
Understanding the genetic regulation, for example, gene expressions (GEs) by copy number variations and methylations, is crucial to uncover the development and progression of complex diseases. Advancing from early studies that are mostly focused on homogeneous groups of patients, some recent studies have shifted their focus toward different patient groups, explored their commonalities and differences, and led to insightful findings. However, the analysis can be very challenging with one GE possibly regulated by multiple regulators and one regulator potentially regulating the expressions of multiple genes, leading to two distinct types of commonalities/differences in the patterns of genetic regulation. In addition, the high dimensionality of both sides of regulation poses challenges to computation. In this study, we develop a two-way fusion integrative analysis approach, which innovatively applies two fusion penalties to simultaneously identify commonalities/differences in the regulated pattern of GEs and regulating pattern of regulators, and adopt a Huber loss function to accommodate the possible data contamination. Moreover, a simple yet efficient iterative optimization algorithm is developed, which does not need to introduce any auxiliary variables and extra tuning parameters and is guaranteed to converge to a globally optimal solution. The advantages of the proposed approach are demonstrated in extensive simulations. The analysis of The Cancer Genome Atlas data on melanoma and lung cancer leads to interesting findings and satisfactory prediction performance.
例如,通过拷贝数变异和甲基化来理解基因表达(GEs)的遗传调控对于揭示复杂疾病的发展和进展至关重要。从早期主要集中在同质患者群体的研究进展到最近的一些研究,它们已经将研究重点转移到不同的患者群体,探索它们的共性和差异,并得出了有见地的发现。然而,由于一个基因可能受到多个调控因子的调节,一个调控因子也可能调节多个基因的表达,因此分析可能非常具有挑战性,这导致了遗传调控模式中两种不同类型的共性/差异。此外,调控双方的高维性给计算带来了挑战。在这项研究中,我们开发了一种双向融合综合分析方法,该方法创新性地应用了两种融合惩罚来同时识别 GE 调节模式和调控因子调节模式中的共性/差异,并采用 Huber 损失函数来适应可能的数据污染。此外,还开发了一种简单而高效的迭代优化算法,该算法不需要引入任何辅助变量和额外的调整参数,并保证收敛到全局最优解。所提出的方法在广泛的模拟中证明了其优势。对黑色素瘤和肺癌的癌症基因组图谱数据的分析得出了有趣的发现和令人满意的预测性能。