Levitt Michael
Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, USA.
Proc Natl Acad Sci U S A. 2009 Jul 7;106(27):11079-84. doi: 10.1073/pnas.0905029106. Epub 2009 Jun 18.
The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by approximately 15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families.
蛋白质总体是指所有生物体的所有蛋白质的集合。在此,根据具有单结构域或多结构域架构的家族以及它们是否具有已知的三维结构,对所有当前已知序列进行了分析。新的单结构域家族的增长非常缓慢:几乎所有的增长都来自新的多结构域架构,这些架构是由约15,000个序列谱所表征的结构域的组合。单结构域家族大多为主要生物类群所共有,而多结构域架构则具有特异性,是物种多样性的原因。四分之一的单结构域家族有已知结构,并且由于所有序列属于这些家族,超过70%的序列可以进行部分建模。