Zhang Yang, Devries Mark E, Skolnick Jeffrey
Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York, USA.
PLoS Comput Biol. 2006 Feb;2(2):e13. doi: 10.1371/journal.pcbi.0020013. Epub 2006 Feb 17.
G protein-coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha) root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis. All predicted GPCR models are freely available for noncommercial users on our Web site (http://www.bioinformatics.buffalo.edu/GPCR).
G蛋白偶联受体(GPCRs)由约5%的人类基因编码,是最大的整合膜蛋白家族,作为细胞表面受体,负责将内源性信号转导为细胞反应。尽管三级结构信息对于功能注释和药物设计至关重要,但实验确定的GPCR结构却很少。为了解决这个问题,我们采用最近开发的穿线装配优化(TASSER)方法,对人类基因组中的所有907个假定GPCR进行结构预测。与传统的同源建模方法不同,TASSER建模不需要已解析的同源模板结构;此外,它通常会将结构优化得更接近天然结构。当缺乏紧密同源模板时,这些特性对于所有人类GPCR的全面建模至关重要。基于一个经过基准测试的置信度评分,大约820个预测模型应该具有正确的折叠结构。大多数GPCR模型具有特征性的七跨膜螺旋拓扑结构,但预计有45个开放阅读框(ORF)具有不同的结构。这是由于GPCR片段主要来自细胞外或细胞内结构域以及数据库注释错误。我们的初步验证包括对牛视紫红质(蛋白质数据库中唯一已解析的GPCR)进行自动建模。在排除同源模板的情况下,TASSER构建的最终模型与天然结构的全局Cα均方根偏差为4.6埃,跨膜螺旋区域的均方根偏差为2.1埃。将几个代表性GPCR的模型与诱变和亲和标记数据进行比较,结果显示出一致的吻合度。预测模型的结构聚类表明,即使序列不同,具有相似结构的GPCR往往属于相似的功能类别。这些结果证明了计算机模拟模型在GPCR功能分析中的实用性和稳健性。所有预测的GPCR模型均可在我们的网站(http://www.bioinformatics.buffalo.edu/GPCR)上供非商业用户免费使用。