Alkan Ferhat, Erten Cesim
Center for Non-coding RNA in Technology and Health.
Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Grønnegardsvej 3, Frederiksberg, DK1870, Denmark.
Bioinformatics. 2017 Feb 15;33(4):537-544. doi: 10.1093/bioinformatics/btw655.
Analysis of protein-protein interaction (PPI) networks provides invaluable insight into several systems biology problems. High-throughput experimental techniques together with computational methods provide large-scale PPI networks. However, a major issue with these networks is their erroneous nature; they contain false-positive interactions and usually many more false-negatives. Recently, several computational methods have been proposed for network reconstruction based on topology, where given an input PPI network the goal is to reconstruct the network by identifying false-positives/-negatives as correctly as possible.
We observe that the existing topology-based network reconstruction algorithms suffer several shortcomings. An important issue is regarding the scalability of their computational requirements, especially in terms of execution times, with the network sizes. They have only been tested on small-scale networks thus far and when applied on large-scale networks of popular PPI databases, the executions require unreasonable amounts of time, or may even crash without producing any output for some instances even after several months of execution. We provide an algorithm, RedNemo, for the topology-based network reconstruction problem. It provides more accurate networks than the alternatives as far as biological qualities measured in terms of most metrics based on gene ontology annotations. The recovery of a high-confidence network modified via random edge removals and rewirings is also better with RedNemo than with the alternatives under most of the experimented removal/rewiring ratios. Furthermore, through extensive tests on databases of varying sizes, we show that RedNemo achieves these results with much better running time performances.
Supplementary material including source code, useful scripts, experimental data and the results are available at http://webprs.khas.edu.tr/~cesim/RedNemo.tar.gz.
Supplementary data are available at Bioinformatics online.
蛋白质-蛋白质相互作用(PPI)网络分析为多个系统生物学问题提供了宝贵的见解。高通量实验技术与计算方法共同提供了大规模的PPI网络。然而,这些网络的一个主要问题是其错误性质;它们包含假阳性相互作用,通常还有更多的假阴性。最近,已经提出了几种基于拓扑的网络重建计算方法,在给定输入PPI网络的情况下,目标是通过尽可能准确地识别假阳性/假阴性来重建网络。
我们观察到现有的基于拓扑的网络重建算法存在几个缺点。一个重要问题是其计算需求的可扩展性,特别是在执行时间方面,随着网络规模的增大。到目前为止,它们仅在小规模网络上进行了测试,当应用于流行PPI数据库的大规模网络时,执行需要不合理的时间量,甚至在某些情况下,即使经过几个月的执行,也可能崩溃而不产生任何输出。我们提供了一种用于基于拓扑的网络重建问题的算法RedNemo。就基于基因本体注释的大多数指标所衡量的生物学质量而言,它比其他方法提供了更准确的网络。在大多数实验的去除/重新布线比率下,通过随机去除和重新布线修改后的高置信度网络的恢复,RedNemo也比其他方法更好。此外,通过对不同规模数据库的广泛测试,我们表明RedNemo在运行时间性能方面要好得多地实现了这些结果。
补充材料包括源代码、有用的脚本、实验数据和结果,可在http://webprs.khas.edu.tr/~cesim/RedNemo.tar.gz获得。
补充数据可在《生物信息学》在线获取。