Memisevic Vesna, Milenkovic Tijana, Przulj Natasa
Department of Computer Science, University of California, Irvine, CA 92697-3435, USA.
J Integr Bioinform. 2010 Mar 25;7(3):466. doi: 10.2390/biecoll-jib-2010-135.
Traditional approaches for homology detection rely on finding sufficient similarities between protein sequences. Motivated by studies demonstrating that from non-sequence based sources of biological information, such as the secondary or tertiary molecular structure, we can extract certain types of biological knowledge when sequence-based approaches fail, we hypothesize that protein-protein interaction (PPI) network topology and protein sequence might give insights into different slices of biological information. Since proteins aggregate to perform a function instead of acting in isolation, analyzing complex wirings around a protein in a PPI network could give deeper insights into the protein's role in the inner working of the cell than analyzing sequences of individual genes. Hence, we believe that one could lose much information by focusing on sequence information alone. We examine whether the information about homologous proteins captured by PPI network topology differs and to what extent from the information captured by their sequences. We measure how similar the topology around homologous proteins in a PPI network is and show that such proteins have statistically significantly higher network similarity than nonhomologous proteins. We compare these network similarity trends of homologous proteins with the trends in their sequence identity and find that network similarities uncover almost as much homology as sequence identities. Although none of the two methods, network topology and sequence identity, seems to capture homology information in its entirety, we demonstrate that the two might give insights into somewhat different types of biological information, as the overlap of the homology information that they uncover is relatively low. Therefore, we conclude that similarities of proteins' topological neighborhoods in a PPI network could be used as a complementary method to sequence-based approaches for identifying homologs, as well as for analyzing evolutionary distance and functional divergence of homologous proteins.
传统的同源性检测方法依赖于在蛋白质序列之间找到足够的相似性。鉴于有研究表明,当基于序列的方法失效时,我们可以从基于非序列的生物信息来源(如二级或三级分子结构)中提取某些类型的生物学知识,我们推测蛋白质-蛋白质相互作用(PPI)网络拓扑结构和蛋白质序列可能会揭示不同层面的生物学信息。由于蛋白质聚集在一起执行功能而非单独发挥作用,因此在PPI网络中分析围绕某一蛋白质的复杂连接,相较于分析单个基因的序列,能更深入地了解该蛋白质在细胞内部运作中的作用。因此,我们认为仅关注序列信息可能会丢失很多信息。我们研究了PPI网络拓扑结构所捕获的同源蛋白质信息与它们的序列所捕获的信息是否不同以及在多大程度上不同。我们测量了PPI网络中同源蛋白质周围的拓扑结构有多相似,并表明这些蛋白质在统计学上具有比非同源蛋白质显著更高的网络相似性。我们将同源蛋白质的这些网络相似性趋势与其序列同一性趋势进行比较,发现网络相似性所揭示的同源性几乎与序列同一性一样多。尽管网络拓扑结构和序列同一性这两种方法似乎都不能完全捕获同源性信息,但我们证明这两者可能会揭示某种程度上不同类型的生物学信息,因为它们所揭示的同源性信息的重叠相对较低。因此,我们得出结论,PPI网络中蛋白质拓扑邻域的相似性可以用作基于序列的方法的补充方法,用于识别同源物,以及分析同源蛋白质的进化距离和功能差异。