Davis Darren, Yaveroğlu Ömer Nebil, Malod-Dognin Noël, Stojmirovic Aleksandar, Pržulj Nataša
California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA.
California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA California Institute of Telecommunications and Technology (Calit2), University of California Irvine, Irvine, CA, USA, Department of Computing, Imperial College London, London, UK, National Center for Biotechnology Information (NCBI), Bethesda, MD, USA and Janssen Research and Development, LLC, Spring House, PA, USA.
Bioinformatics. 2015 May 15;31(10):1632-9. doi: 10.1093/bioinformatics/btv026. Epub 2015 Jan 20.
Proteins underlay the functioning of a cell and the wiring of proteins in protein-protein interaction network (PIN) relates to their biological functions. Proteins with similar wiring in the PIN (topology around them) have been shown to have similar functions. This property has been successfully exploited for predicting protein functions. Topological similarity is also used to guide network alignment algorithms that find similarly wired proteins between PINs of different species; these similarities are used to transfer annotation across PINs, e.g. from model organisms to human. To refine these functional predictions and annotation transfers, we need to gain insight into the variability of the topology-function relationships. For example, a function may be significantly associated with specific topologies, while another function may be weakly associated with several different topologies. Also, the topology-function relationships may differ between different species.
To improve our understanding of topology-function relationships and of their conservation among species, we develop a statistical framework that is built upon canonical correlation analysis. Using the graphlet degrees to represent the wiring around proteins in PINs and gene ontology (GO) annotations to describe their functions, our framework: (i) characterizes statistically significant topology-function relationships in a given species, and (ii) uncovers the functions that have conserved topology in PINs of different species, which we term topologically orthologous functions. We apply our framework to PINs of yeast and human, identifying seven biological process and two cellular component GO terms to be topologically orthologous for the two organisms.
蛋白质是细胞功能的基础,蛋白质 - 蛋白质相互作用网络(PIN)中蛋白质的连接方式与其生物学功能相关。在PIN中具有相似连接方式(其周围的拓扑结构)的蛋白质已被证明具有相似的功能。这一特性已成功用于预测蛋白质功能。拓扑相似性还用于指导网络比对算法,该算法可在不同物种的PIN中找到连接方式相似的蛋白质;这些相似性用于跨PIN转移注释,例如从模式生物到人类。为了完善这些功能预测和注释转移,我们需要深入了解拓扑 - 功能关系的变异性。例如,一种功能可能与特定拓扑结构显著相关,而另一种功能可能与几种不同拓扑结构弱相关。此外,不同物种之间的拓扑 - 功能关系可能不同。
为了更好地理解拓扑 - 功能关系及其在物种间的保守性,我们开发了一个基于典型相关分析的统计框架。使用图元度数来表示PIN中蛋白质周围连接方式,并使用基因本体(GO)注释来描述其功能,我们的框架:(i)在给定物种中表征具有统计学意义的拓扑 - 功能关系,以及(ii)揭示在不同物种的PIN中具有保守拓扑结构的功能,我们将其称为拓扑直系同源功能。我们将我们的框架应用于酵母和人类的PIN,确定了七个生物学过程和两个细胞组分GO术语在这两种生物中是拓扑直系同源的。