Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.
PLoS One. 2012;7(6):e38913. doi: 10.1371/journal.pone.0038913. Epub 2012 Jun 13.
"Protein quaternary structure universe" refers to the ensemble of all protein-protein complexes across all organisms in nature. The number of quaternary folds thus corresponds to the number of ways proteins physically interact with other proteins. This study focuses on answering two basic questions: Whether the number of protein-protein interactions is limited and, if yes, how many different quaternary folds exist in nature. By all-to-all sequence and structure comparisons, we grouped the protein complexes in the protein data bank (PDB) into 3,629 families and 1,761 folds. A statistical model was introduced to obtain the quantitative relation between the numbers of quaternary families and quaternary folds in nature. The total number of possible protein-protein interactions was estimated around 4,000, which indicates that the current protein repository contains only 42% of quaternary folds in nature and a full coverage needs approximately a quarter century of experimental effort. The results have important implications to the protein complex structural modeling and the structure genomics of protein-protein interactions.
“蛋白质四级结构宇宙”是指自然界中所有生物体的所有蛋白质-蛋白质复合物的集合。因此,四级折叠的数量对应于蛋白质与其他蛋白质物理相互作用的方式数量。本研究集中回答两个基本问题:蛋白质-蛋白质相互作用的数量是否有限,如果是,自然界中存在多少种不同的四级折叠。通过全对全序列和结构比较,我们将蛋白质数据库(PDB)中的蛋白质复合物分为 3629 个家族和 1761 个折叠。引入了一个统计模型来获得自然界中四级家族和四级折叠数量之间的定量关系。估计可能的蛋白质-蛋白质相互作用的总数约为 4000,这表明当前的蛋白质库仅包含自然界中四级折叠的 42%,完全覆盖需要大约四分之一世纪的实验工作。这些结果对蛋白质复合物结构建模和蛋白质-蛋白质相互作用的结构基因组学具有重要意义。