Bonchev Danail
Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284-2030, USA.
Chem Biodivers. 2004 Feb;1(2):312-26. doi: 10.1002/cbdv.200490028.
Topological and compositional complexity of protein-protein networks is assessed in a variety of ways making use of graph theory and information theory. The methodology used is borrowed from mathematical chemistry and includes complexity descriptors such as substructure count, overall connectivity, walk count, and information on various vertex distributions. The approach is applied to the (incomplete) proteome of Saccharomyces cerevisiae containing 232 protein complexes of a total of 1,440 proteins. The proteome network and each of its nine functional subsets of protein complexes are disconnected graphs, containing a number of noninteracting species and a major component. A weighted edge between two vertices in these graphs stands for the number of shared proteins between the respective complexes. The major component is a highly connected, 'small-world' network, in which the average vertex distance between protein complexes does not exceed 2.2 (2.4 for the entire proteome), whereas the maximum distance does not exceed 4 (or 5 for the proteome). The vertex degree distribution in the major proteome component with 199 complexes follows the power law P(k) approximately k(-gamma), with gamma approximately = 1.7. The analysis of the functional organization of the yeast proteome has shown that, for any pair of biological functions, there always exist many proteins that can perform both functions. The potential application of the quantitative proteome descriptors discussed includes quantitative relationships between the structure and biological action of dynamic protein complexes in changing environment, identification of targets for markers/drugs, as well as system analysis and comparative studies of proteomes.
利用图论和信息论,人们通过多种方式评估蛋白质 - 蛋白质网络的拓扑和组成复杂性。所使用的方法借鉴自数学化学,包括子结构计数、整体连通性、游走计数以及各种顶点分布信息等复杂性描述符。该方法应用于酿酒酵母(Saccharomyces cerevisiae)的(不完整)蛋白质组,该蛋白质组包含总共1440个蛋白质的232个蛋白质复合物。蛋白质组网络及其九个蛋白质复合物功能子集均为非连通图,包含许多非相互作用的物种和一个主要成分。这些图中两个顶点之间的加权边代表相应复合物之间共享蛋白质的数量。主要成分是一个高度连通的“小世界”网络,其中蛋白质复合物之间的平均顶点距离不超过2.2(整个蛋白质组为2.4),而最大距离不超过4(蛋白质组为5)。包含199个复合物的主要蛋白质组成分中的顶点度分布遵循幂律P(k) 近似为k^(-γ),γ约为1.7。对酵母蛋白质组功能组织的分析表明,对于任意一对生物学功能,总是存在许多能够执行这两种功能的蛋白质。所讨论的定量蛋白质组描述符的潜在应用包括动态蛋白质复合物在变化环境中的结构与生物学作用之间的定量关系、标记物/药物靶点的识别以及蛋白质组的系统分析和比较研究。