Mizianty Marcin J, Fan Xiao, Yan Jing, Chalmers Eric, Woloschuk Christopher, Joachimiak Andrzej, Kurgan Lukasz
Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada.
Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, IL 60439, USA.
Acta Crystallogr D Biol Crystallogr. 2014 Nov;70(Pt 11):2781-93. doi: 10.1107/S1399004714019427. Epub 2014 Oct 23.
Structural genomics programs have developed and applied structure-determination pipelines to a wide range of protein targets, facilitating the visualization of macromolecular interactions and the understanding of their molecular and biochemical functions. The fundamental question of whether three-dimensional structures of all proteins and all functional annotations can be determined using X-ray crystallography is investigated. A first-of-its-kind large-scale analysis of crystallization propensity for all proteins encoded in 1953 fully sequenced genomes was performed. It is shown that current X-ray crystallographic knowhow combined with homology modeling can provide structures for 25% of modeling families (protein clusters for which structural models can be obtained through homology modeling), with at least one structural model produced for each Gene Ontology functional annotation. The coverage varies between superkingdoms, with 19% for eukaryotes, 35% for bacteria and 49% for archaea, and with those of viruses following the coverage values of their hosts. It is shown that the crystallization propensities of proteomes from the taxonomic superkingdoms are distinct. The use of knowledge-based target selection is shown to substantially increase the ability to produce X-ray structures. It is demonstrated that the human proteome has one of the highest attainable coverage values among eukaryotes, and GPCR membrane proteins suitable for X-ray structure determination were determined.
结构基因组学项目已经开发并将结构测定流程应用于广泛的蛋白质靶点,促进了大分子相互作用的可视化以及对其分子和生化功能的理解。本文研究了是否可以使用X射线晶体学确定所有蛋白质的三维结构和所有功能注释这一基本问题。对1953个全测序基因组中编码的所有蛋白质的结晶倾向进行了首次大规模分析。结果表明,当前的X射线晶体学技术与同源建模相结合,可以为25%的建模家族(可通过同源建模获得结构模型的蛋白质簇)提供结构,并且为每个基因本体功能注释至少生成一个结构模型。不同超界的覆盖率有所不同,真核生物为19%,细菌为35%,古细菌为49%,病毒的覆盖率则与其宿主的覆盖率一致。结果表明,分类超界中蛋白质组的结晶倾向各不相同。研究表明,基于知识的靶点选择能显著提高产生X射线结构的能力。结果证明,人类蛋白质组在真核生物中具有最高的可实现覆盖率之一,并且确定了适合进行X射线结构测定的GPCR膜蛋白。