Jain Preti, Wadhwa Puneet, Aygun Ramazan, Podila Gopi
Department of Biological Sciences, University of Alabama in Huntsville, Huntsville, AL 35899, USA.
In Silico Biol. 2008;8(2):141-55.
Heterotrimeric G proteins interact with G protein-coupled receptors in response to stimulation by hormones, neurotransmitters, chemokines, and sensory signals to intracellular signaling cascades. Recently reported studies indicate that G protein subunits play a significant role in different eukaryotic diseases including inflammation, neurological diseases, cardiovascular diseases, endocrine disorders as well as plant pathogen response, infectious hyphae growth, differentiation and virulence of pathogenic fungi. Thus a study of their functions, signaling pathways, and protein interactions may lead to the development of various preventive approaches. The diversity of alpha, beta and gamma subunits of G proteins necessitates a prediction algorithm that helps in the identification of new proteins such as Gbeta where WD-40 repeats are not well characterized. The currently available techniques for finding G proteins are homology based search analyses and wet lab experiments, which are not very effective in finding new classes of proteins. We present here a robust computational method for finding new G proteins and their homologs using a SVM based pattern recognition algorithm. Several physicochemical and compositional properties including dipeptide, tripeptide and hydrophobicity composition are used for generating the SVM classifiers. This method has 96.17%, 95.38%, 97.6% sensitivity and 99.45%, 100%, 100% specificity on test sets for G protein alpha, beta, and gamma subunits, respectively. This algorithm correctly predicts the known alpha, beta and gamma subunits reported in literature. One important contribution of this algorithm is that it helps in improving genome annotation of several proteins as G proteins and serves as a useful tool for comparative genomic analysis of G proteins. Using this method, novel G protein subunits are predicted in 31 genomes covering plant, fungi and animal kingdom. The software is available at the website http://biomine.cs.uah.edu/bioinformatics/svm_prog/scripts/GProteins/vectorg.html. Supplementary files: The supplementary files are available on http://www.bioinfo.de/isb/2008/08/0013/supplementary_ material/.
异源三聚体G蛋白在激素、神经递质、趋化因子和感觉信号的刺激下与G蛋白偶联受体相互作用,从而引发细胞内信号级联反应。最近报道的研究表明,G蛋白亚基在不同的真核疾病中发挥着重要作用,包括炎症、神经疾病、心血管疾病、内分泌紊乱以及植物病原体反应、致病真菌的感染性菌丝生长、分化和毒力。因此,对其功能、信号通路和蛋白质相互作用的研究可能会导致各种预防方法的开发。G蛋白的α、β和γ亚基的多样性需要一种预测算法,以帮助识别新的蛋白质,如WD-40重复序列特征不明确的Gβ。目前用于寻找G蛋白的技术是基于同源性的搜索分析和湿实验室实验,这些方法在寻找新的蛋白质类别方面效果并不理想。我们在此提出一种基于支持向量机的模式识别算法的强大计算方法,用于寻找新的G蛋白及其同源物。包括二肽、三肽和疏水性组成在内的几种物理化学和组成特性用于生成支持向量机分类器。该方法在G蛋白α、β和γ亚基的测试集上的灵敏度分别为96.17%、95.38%、97.6%,特异性分别为99.45%、100%、100%。该算法正确预测了文献中报道的已知α、β和γ亚基。该算法的一个重要贡献是,它有助于改善几种蛋白质作为G蛋白的基因组注释,并作为G蛋白比较基因组分析的有用工具。使用这种方法,在涵盖植物、真菌和动物界的31个基因组中预测了新的G蛋白亚基。该软件可在网站http://biomine.cs.uah.edu/bioinformatics/svm_prog/scripts/GProteins/vectorg.html上获取。补充文件:补充文件可在http://www.bioinfo.de/isb/2008/08/0013/supplementary_ material/上获取。