Lopes Fabrício M, Cesar Roberto M, Costa Luciano Da F
Federal University of Technology-Paraná and Institute of Mathematics and Statistics, University of São Paulo, Brazil.
J Comput Biol. 2011 Oct;18(10):1353-67. doi: 10.1089/cmb.2010.0118. Epub 2011 May 6.
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree
得益于分子生物学的最新进展,再加上实验数据量不断增加,现在可以通过使用诸如cDNA微阵列和RNA测序等方法同时提取数千个基因的功能状态。特别重要的相关研究是从表达数据集中对基因调控网络进行建模和识别。这样的知识对于许多应用至关重要,例如疾病治疗、治疗干预策略和药物设计,以及规划高通量新实验。已经开发了从表达谱进行基因网络建模和识别的方法。然而,一个重要的开放性问题是如何验证这些方法及其结果。这项工作提出了一种用于验证基因网络建模和识别的客观方法,该方法包括以下三个主要方面:(1)通过复杂网络的理论模型生成人工基因网络(AGN)模型,用于模拟时间表达数据;(2)一种从模拟数据中识别基因网络的计算方法,该方法基于一种特征选择方法,其中固定一个目标基因,并观察所有其他基因的表达谱,以识别相关的预测因子子集;(3)通过与原始网络比较来验证基于AGN识别的网络。所提出的框架允许生成和使用几种类型的AGN,以模拟时间表达数据。然后可以将网络识别方法的结果与原始网络进行比较,以评估其特性和准确性。已经评估了一些最重要的复杂网络理论模型:均匀随机的厄多斯 - 雷尼(ER)模型、小世界的瓦茨 - 斯托加茨(WS)模型、无标度的巴拉巴西 - 阿尔伯特(BA)模型和地理网络(GG)。实验结果表明,推理方法对平均度