Vinh Nguyen Xuan, Chetty Madhu, Coppel Ross, Wangikar Pramod P
Gippsland School of Information Technology, Monash University, Australia.
Biochim Biophys Acta. 2012 Dec;1824(12):1434-41. doi: 10.1016/j.bbapap.2012.05.017. Epub 2012 Jun 6.
Genetic network reverse engineering has been an area of intensive research within the systems biology community during the last decade. With many techniques currently available, the task of validating them and choosing the best one for a certain problem is a complex issue. Current practice has been to validate an approach on in-silico synthetic data sets, and, wherever possible, on real data sets with known ground-truth. In this study, we highlight a major issue that the validation of reverse engineering algorithms on small benchmark networks very often results in networks which are not statistically better than a randomly picked network. Another important issue highlighted is that with short time series, a small variation in the pre-processing procedure might yield large differences in the inferred networks. To demonstrate these issues, we have selected as our case study the IRMA in-vivo synthetic yeast network recently published in Cell. Using Fisher's exact test, we show that many results reported in the literature on reverse-engineering this network are not significantly better than random. The discussion is further extended to some other networks commonly used for validation purposes in the literature. The results presented in this study emphasize that studies carried out using small genetic networks are likely to be trivial, making it imperative that larger real networks be used for validating and benchmarking purposes. If smaller networks are considered, then the results should be interpreted carefully to avoid over confidence. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.
在过去十年中,基因网络逆向工程一直是系统生物学领域的一个深入研究方向。由于目前有许多技术可用,验证这些技术并为特定问题选择最佳技术是一个复杂的问题。当前的做法是在计算机模拟合成数据集上验证一种方法,并尽可能在具有已知真实情况的真实数据集上进行验证。在本研究中,我们强调了一个主要问题,即在小型基准网络上对逆向工程算法进行验证时,常常会得到在统计上并不比随机选择的网络更好的网络。我们强调的另一个重要问题是,对于短时间序列,预处理过程中的微小变化可能会在推断出的网络中产生很大差异。为了证明这些问题,我们选择最近发表在《细胞》杂志上的IRMA体内合成酵母网络作为我们的案例研究。使用Fisher精确检验,我们表明文献中报道的关于逆向工程该网络的许多结果并不比随机结果有显著优势。讨论进一步扩展到文献中常用于验证目的的其他一些网络。本研究提出的结果强调,使用小型基因网络进行的研究可能没有实际意义,因此必须使用更大的真实网络进行验证和基准测试。如果考虑使用较小的网络,那么对结果的解释应该谨慎,以避免过度自信。本文是名为“蛋白质相互作用和结构预测的计算方法”的特刊的一部分。