Istituto di Studi sui Sistemi Intelligenti per l'Automazione, CNR-ISSIA, Via Amendola 122/D-O, I-70126 Bari, Italy.
J Biomed Inform. 2013 Oct;46(5):894-904. doi: 10.1016/j.jbi.2013.07.002. Epub 2013 Jul 20.
The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology.
In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'ℓ(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ℓ(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling.
Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip.
从表达数据推断(或“反向工程”)基因调控网络,以及描述基因之间复杂的依赖结构,是现代分子生物学中的开放性问题。
在本文中,我们比较了三种正则化协方差选择方法,用于推断基因调控网络,这些方法是为了规避当观测数 n 小于基因数 p 时出现的问题而开发的。所考察的方法提供了逆协方差矩阵的三个替代估计值:(a)“PINV”方法基于 Moore-Penrose 伪逆,(b)“RCM”方法执行回归残差之间的相关性,(c)“ℓ(2C)”方法最大化适当正则化对数似然函数。我们广泛的模拟研究表明,ℓ(2C)的表现优于其他两种方法,它具有最具预测性的部分相关估计值和最高的基因条件依赖推断灵敏度值,即使观测数较少也是如此。该方法在推断拟南芥异戊二烯生物合成途径的基因网络中的应用,揭示了两个异戊二烯途径中的两个中心之间存在负的部分相关系数,更重要的是,提供了质体和细胞质途径中基因之间存在串扰的证据。当应用于与人类细胞培养中 HRAS 癌基因特征相关的基因表达数据时,该方法揭示了 9 个(p 值<0.0005)与 HRAS 直接相互作用的基因,它们共享转录因子 RREB1 对转录因子的 Ras 反应结合位点。这一结果表明,这些基因的转录激活是由 Ras 信号下游的共同转录因子介导的。
以 Matlab 脚本形式实现这些方法的软件可在以下网址获得:http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip。