Computer Science Institute, Technological University of the Mixteca Region, 69000 Huajuapan, Oaxaca, Mexico.
Laboratory of Molecular Neuropharmacology and Bioinformatics, Institut de Neurociències and Unitat de Bioestadística, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain.
Molecules. 2018 Mar 19;23(3):690. doi: 10.3390/molecules23030690.
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The complete tertiary structure including both extracellular and transmembrane domains has not been determined for any member of class C GPCRs. An alternative way to work on GPCR structural models is the investigation of their functionality through the analysis of their primary structure. For this, sequence representation is a key factor for the GPCRs' classification context, where usually, feature engineering is carried out. In this paper, we propose the use of representation learning to acquire the features that best represent the class C GPCR sequences and at the same time to obtain a model for classification automatically. Deep learning methods in conjunction with amino acid physicochemical property indices are then used for this purpose. Experimental results assessed by the classification accuracy, Matthews' correlation coefficient and the balanced error rate show that using a hydrophobicity index and a restricted Boltzmann machine (RBM) can achieve performance results (accuracy of 92.9%) similar to those reported in the literature. As a second proposal, we combine two or more physicochemical property indices instead of only one as the input for a deep architecture in order to add information from the sequences. Experimental results show that using three hydrophobicity-related index combinations helps to improve the classification performance (accuracy of 94.1%) of an RBM better than those reported in the literature for class C GPCRs without using feature selection methods.
G 蛋白偶联受体 (GPCR) 是具有相关性的细胞表面膜蛋白,在药理学中具有重要意义。尚未确定任何 C 类 GPCR 成员的完整三级结构,包括细胞外和跨膜结构域。研究 GPCR 结构模型的另一种方法是通过分析其一级结构来研究其功能。为此,序列表示是 GPCR 分类背景的关键因素,通常在此过程中进行特征工程。在本文中,我们提出使用表示学习来获取最佳表示 C 类 GPCR 序列的特征,同时自动获得分类模型。然后,使用深度学习方法结合氨基酸物理化学性质指数来实现这一目标。通过分类准确性、马修斯相关系数和平衡错误率评估的实验结果表明,使用疏水性指数和受限玻尔兹曼机 (RBM) 可以实现与文献中报道的相似的性能结果(准确性为 92.9%)。作为第二个建议,我们将两个或更多物理化学性质指数组合在一起而不是仅一个作为深度架构的输入,以从序列中添加信息。实验结果表明,使用三种疏水性相关指数组合有助于提高 RBM 的分类性能(准确性为 94.1%),优于文献中报道的无需使用特征选择方法的 C 类 GPCR。