Suppr
超能文献

GPCR-CA：一种用于预测G蛋白偶联受体功能类别的细胞自动机图像方法。

GPCR-CA: A cellular automaton image approach for predicting G-protein-coupled receptor functional classes.

作者信息

Xiao Xuan, Wang Pu, Chou Kuo-Chen

机构信息

Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 33300, China.

出版信息

J Comput Chem. 2009 Jul 15;30(9):1414-23. doi: 10.1002/jcc.21163.

DOI:10.1002/jcc.21163

PMID:19037861

Abstract

Given an uncharacterized protein sequence, how can we identify whether it is a G-protein-coupled receptor (GPCR) or not? If it is, which functional family class does it belong to? It is important to address these questions because GPCRs are among the most frequent targets of therapeutic drugs and the information thus obtained is very useful for "comparative and evolutionary pharmacology," a technique often used for drug development. Here, we present a web-server predictor called "GPCR-CA," where "CA" stands for "Cellular Automaton" (Wolfram, S. Nature 1984, 311, 419), meaning that the CA images have been utilized to reveal the pattern features hidden in piles of long and complicated protein sequences. Meanwhile, the gray-level co-occurrence matrix factors extracted from the CA images are used to represent the samples of proteins through their pseudo amino acid composition (Chou, K.C. Proteins 2001, 43, 246). GPCR-CA is a two-layer predictor: the first layer prediction engine is for identifying a query protein as GPCR on non-GPCR; if it is a GPCR protein, the process will be automatically continued with the second-layer prediction engine to further identify its type among the following six functional classes: (a) rhodopsin-like, (b) secretin-like, (c) metabotrophic/glutamate/pheromone; (d) fungal pheromone, (e) cAMP receptor, and (f) frizzled/smoothened family. The overall success rates by the predictor for the first and second layers are over 91% and 83%, respectively, that were obtained through rigorous jackknife cross-validation tests on a new-constructed stringent benchmark dataset in which none of proteins has >or=40% pairwise sequence identity to any other in a same subset. GPCR-CA is freely accessible at http://218.65.61.89:8080/bioinfo/GPCR-CA, by which one can get the desired two-layer results for a query protein sequence within about 20 seconds.

摘要

给定一个未表征的蛋白质序列，我们如何确定它是否为G蛋白偶联受体（GPCR）呢？如果是，它属于哪个功能家族类别呢？解决这些问题很重要，因为GPCR是治疗药物最常见的靶点之一，由此获得的信息对于“比较和进化药理学”非常有用，这是一种常用于药物开发的技术。在此，我们展示了一个名为“GPCR-CA”的网络服务器预测工具，其中“CA”代表“细胞自动机”（沃尔夫勒姆，S.《自然》1984年，第311卷，第419页），这意味着利用细胞自动机图像来揭示隐藏在一堆冗长复杂蛋白质序列中的模式特征。同时，从细胞自动机图像中提取的灰度共生矩阵因子通过其伪氨基酸组成来表示蛋白质样本（周，K.C.《蛋白质》2001年，第43卷，第246页）。GPCR-CA是一个两层预测工具：第一层预测引擎用于将查询蛋白质识别为GPCR或非GPCR；如果是GPCR蛋白质，该过程将自动进入第二层预测引擎，以在以下六个功能类别中进一步确定其类型：（a）视紫红质样，（b）促胰液素样，（c）代谢型/谷氨酸/信息素；（d）真菌信息素，（e）cAMP受体，以及（f）卷曲蛋白/ smoothened家族。通过在一个新构建的严格基准数据集上进行严格的留一法交叉验证测试，该预测工具第一层和第二层的总体成功率分别超过91%和83%，在该数据集中，同一子集中没有任何蛋白质与其他蛋白质的成对序列同一性≥40%。可通过http://218.65.61.89:8080/bioinfo/GPCR-CA免费访问GPCR-CA，通过该网站，人们可以在大约20秒内获得查询蛋白质序列所需的两层结果。