Strong Michael, Sawaya Michael R, Wang Shuishu, Phillips Martin, Cascio Duilio, Eisenberg David
Howard Hughes Medical Institute, UCLA-Department of Energy Institute of Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA.
Proc Natl Acad Sci U S A. 2006 May 23;103(21):8060-5. doi: 10.1073/pnas.0602606103. Epub 2006 May 11.
The developing science called structural genomics has focused to date mainly on high-throughput expression of individual proteins, followed by their purification and structure determination. In contrast, the term structural biology is used to denote the determination of structures, often complexes of several macromolecules, that illuminate aspects of biological function. Here we bridge structural genomics to structural biology with a procedure for determining protein complexes of previously unknown function from any organism with a sequenced genome. From computational genomic analysis, we identify functionally linked proteins and verify their interaction in vitro by coexpression/copurification. We illustrate this procedure by the structural determination of a previously unknown complex between a PE and PPE protein from the Mycobacterium tuberculosis genome, members of protein families that constitute approximately 10% of the coding capacity of this genome. The predicted complex was readily expressed, purified, and crystallized, although we had previously failed in expressing individual PE and PPE proteins on their own. The reason for the failure is clear from the structure, which shows that the PE and PPE proteins mate along an extended apolar interface to form a four-alpha-helical bundle, where two of the alpha-helices are contributed by the PE protein and two by the PPE protein. Our entire procedure for the identification, characterization, and structural determination of protein complexes can be scaled to a genome-wide level.
目前,新兴的结构基因组学主要聚焦于单个蛋白质的高通量表达,随后进行蛋白质纯化和结构测定。相比之下,结构生物学一词用于指代对结构的测定,这些结构通常是由几个大分子组成的复合物,它们能够阐明生物学功能的各个方面。在这里,我们通过一种方法将结构基因组学与结构生物学联系起来,该方法可用于从任何具有测序基因组的生物体中确定功能未知的蛋白质复合物。通过计算基因组分析,我们识别出功能相关的蛋白质,并通过共表达/共纯化在体外验证它们的相互作用。我们通过对结核分枝杆菌基因组中一种PE蛋白和一种PPE蛋白之间先前未知的复合物进行结构测定来说明这一过程,这两种蛋白家族的成员约占该基因组编码能力的10%。尽管我们之前单独表达单个PE和PPE蛋白时失败了,但预测的复合物很容易表达、纯化和结晶。从结构上可以清楚地看出失败的原因,该结构表明PE和PPE蛋白沿着一个延伸的非极性界面结合形成一个四螺旋束,其中两个α螺旋由PE蛋白贡献,另外两个由PPE蛋白贡献。我们用于识别、表征和确定蛋白质复合物结构的整个过程可以扩展到全基因组水平。