Gu Zhujie, Uh Hae-Won, Houwing-Duistermaat Jeanine, El Bouhaddani Said
Department of Data Science and Biostatistics, Julius Centre, UMC Utrecht, Utrecht, The Netherlands.
Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK.
J Appl Stat. 2024 Feb 21;51(13):2627-2651. doi: 10.1080/02664763.2024.2313458. eCollection 2024.
In many studies of human diseases, multiple omics datasets are measured. Typically, these omics datasets are studied one by one with the disease, thus the relationship between omics is overlooked. Modeling the joint part of multiple omics and its association to the outcome disease will provide insights into the complex molecular base of the disease. Several dimension reduction methods which jointly model multiple omics and two-stage approaches that model the omics and outcome in separate steps are available. Holistic one-stage models for both omics and outcome are lacking. In this article, we propose a novel one-stage method that jointly models an outcome variable with omics. We establish the model identifiability and develop EM algorithms to obtain maximum likelihood estimators of the parameters for normally and Bernoulli distributed outcomes. Test statistics are proposed to infer the association between the outcome and omics, and their asymptotic distributions are derived. Extensive simulation studies are conducted to evaluate the proposed model. The method is illustrated by modeling Down syndrome as outcome and methylation and glycomics as omics datasets. Here we show that our model provides more insight by jointly considering methylation and glycomics.
在许多人类疾病研究中,会测量多个组学数据集。通常,这些组学数据集是分别与疾病进行研究的,因此组学之间的关系被忽视了。对多个组学的联合部分及其与疾病结局的关联进行建模,将有助于深入了解疾病复杂的分子基础。有几种能联合对多个组学进行建模的降维方法,以及分步骤对组学和结局进行建模的两阶段方法。目前缺乏用于组学和结局的整体单阶段模型。在本文中,我们提出了一种新颖的单阶段方法,可将结局变量与组学联合建模。我们建立了模型的可识别性,并开发了期望最大化(EM)算法,以获得正态分布和伯努利分布结局参数的最大似然估计值。我们提出了检验统计量来推断结局与组学之间的关联,并推导了它们的渐近分布。进行了广泛的模拟研究以评估所提出的模型。通过将唐氏综合征作为结局,甲基化和糖组学作为组学数据集进行建模来说明该方法。在此我们表明,通过联合考虑甲基化和糖组学,我们的模型能提供更多见解。