IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1671-1681. doi: 10.1109/TCBB.2019.2899568. Epub 2019 Feb 14.
Schizophrenia (SZ) is a complex disease. Single nucleotide polymorphism (SNP), brain activity measured by functional magnetic resonance imaging (fMRI) and DNA methylation are all important biomarkers that can be used for the study of SZ. To our knowledge, there has been little effort to combine these three datasets together. In this study, we propose a group sparse joint nonnegative matrix factorization (GSJNMF) model to integrate SNP, fMRI, and DNA methylation for the identification of multi-dimensional modules associated with SZ, which can be used to study regulatory mechanisms underlying SZ at multiple levels. The proposed GSJNMF model projects multiple types of data onto a common feature space, in which heterogeneous variables with large coefficients on the same projected bases are used to identify multi-dimensional modules. We also incorporate group structure information available from each dataset. The genomic factors in such modules have significant correlations or functional associations with several brain activities. At the end, we have applied the method to the analysis of real data collected from the Mind Clinical Imaging Consortium (MCIC) for the study of SZ and identified significant biomarkers. These biomarkers were further used to discover genes and corresponding brain regions, which were confirmed to be significantly associated with SZ.
精神分裂症(SZ)是一种复杂的疾病。单核苷酸多态性(SNP)、功能磁共振成像(fMRI)测量的大脑活动和 DNA 甲基化都是重要的生物标志物,可用于 SZ 的研究。据我们所知,很少有研究将这三个数据集结合起来。在这项研究中,我们提出了一个基于组稀疏联合非负矩阵分解(GSJNMF)的模型,用于整合 SNP、fMRI 和 DNA 甲基化数据,以识别与 SZ 相关的多维模块,可用于在多个层次上研究 SZ 的调控机制。所提出的 GSNJMF 模型将多种类型的数据投影到一个共同的特征空间中,其中在同一投影基上具有较大系数的异构变量用于识别多维模块。我们还整合了来自每个数据集的组结构信息。这些模块中的基因组因素与多个大脑活动具有显著的相关性或功能关联。最后,我们将该方法应用于从 Mind 临床成像联盟(MCIC)收集的用于 SZ 研究的真实数据的分析,并确定了显著的生物标志物。这些生物标志物进一步用于发现与 SZ 显著相关的基因和相应的大脑区域。