Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
Department of Molecular Medicine and Genetics, Faculty of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran.
J Bioinform Comput Biol. 2021 Jun;19(3):2150007. doi: 10.1142/S0219720021500074. Epub 2021 Apr 30.
Large amounts of research efforts have been focused on learning gene regulatory networks (GRNs) based on gene expression data to understand the functional basis of a living organism. Under the assumption that the joint distribution of the gene expressions of interest is a multivariate normal distribution, such networks can be constructed by assessing the nonzero elements of the inverse covariance matrix, the so-called precision matrix or concentration matrix. This may not reflect the true connectivity between genes by considering just pairwise linear correlations. To relax this limitative constraint, we employ Gaussian process (GP) model which is well known as computationally efficient non-parametric Bayesian machine learning technique. GPs are among a class of methods known as kernel machines which can be used to approximate complex problems by tuning their hyperparameters. In fact, GP creates the ability to use the capacity and potential of different kernels in constructing precision matrix and GRNs. In this paper, in the first step, we choose the GP with appropriate kernel to learn the considered GRNs from the observed genetic data, and then we estimate kernel hyperparameters using rule-of-thumb technique. Using these hyperparameters, we can also control the degree of sparseness in the precision matrix. Then we obtain kernel-based precision matrix similar to GLASSO to construct kernel-based GRN. The findings of our research are used to construct GRNs with high performance, for different species of fly rather than simply using the assumption of multivariate normal distribution, and the GPs, despite the use of the kernels capacity, have a much better performance than the multivariate Gaussian distribution assumption.
大量的研究工作都集中在基于基因表达数据学习基因调控网络(GRNs)上,以了解生物的功能基础。在感兴趣的基因表达的联合分布假设为多元正态分布的情况下,可以通过评估逆协方差矩阵的非零元素来构建这些网络,即所谓的精度矩阵或浓度矩阵。这种方法仅考虑两两线性相关,可能无法反映基因之间的真实连接。为了放宽这个限制,我们采用了高斯过程(GP)模型,这是一种众所周知的计算效率高的非参数贝叶斯机器学习技术。GP 是一类被称为核机器的方法之一,这些方法可以通过调整超参数来近似复杂问题。实际上,GP 可以利用不同核的能力和潜力来构建精度矩阵和 GRNs。在本文中,我们首先选择适当的核的 GP 从观察到的遗传数据中学习所考虑的 GRNs,然后使用经验法则技术估计核超参数。使用这些超参数,我们还可以控制精度矩阵的稀疏度。然后,我们获得基于核的精度矩阵,类似于 GLASSO,以构建基于核的 GRN。我们的研究结果用于构建高性能的 GRNs,适用于不同种类的果蝇,而不仅仅是简单地使用多元正态分布的假设,并且 GP 尽管使用了核的能力,但性能要比多元高斯分布假设好得多。