Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA.
Biomolecules. 2021 Jun 11;11(6):870. doi: 10.3390/biom11060870.
G-protein-coupled receptors (GPCRs) are the largest and most diverse group of cell surface receptors that respond to various extracellular signals. The allosteric modulation of GPCRs has emerged in recent years as a promising approach for developing target-selective therapies. Moreover, the discovery of new GPCR allosteric modulators can greatly benefit the further understanding of GPCR cell signaling mechanisms. It is critical but also challenging to make an accurate distinction of modulators for different GPCR groups in an efficient and effective manner. In this study, we focus on an 11-class classification task with 10 GPCR subtype classes and a random compounds class. We used a dataset containing 34,434 compounds with allosteric modulators collected from classical GPCR families A, B, and C, as well as random drug-like compounds. Six types of machine learning models, including support vector machine, naïve Bayes, decision tree, random forest, logistic regression, and multilayer perceptron, were trained using different combinations of features including molecular descriptors, Atom-pair fingerprints, MACCS fingerprints, and ECFP6 fingerprints. The performances of trained machine learning models with different feature combinations were closely investigated and discussed. To the best of our knowledge, this is the first work on the multi-class classification of GPCR allosteric modulators. We believe that the classification models developed in this study can be used as simple and accurate tools for the discovery and development of GPCR allosteric modulators.
G 蛋白偶联受体 (GPCRs) 是细胞表面受体中最大和最多样化的一组,它们可以响应各种细胞外信号。近年来,GPCR 的变构调节已成为开发靶标选择性治疗方法的一种很有前途的方法。此外,发现新的 GPCR 变构调节剂可以极大地促进对 GPCR 细胞信号转导机制的进一步理解。以有效和有效的方式准确区分不同 GPCR 群体的调节剂是至关重要的,但也是具有挑战性的。在本研究中,我们专注于具有 10 个 GPCR 亚型类和一个随机化合物类的 11 类分类任务。我们使用了一个包含 34434 种化合物的数据集,这些化合物都具有从经典 GPCR 家族 A、B 和 C 以及随机药物样化合物中收集的变构调节剂。使用包括分子描述符、原子对指纹、MACCS 指纹和 ECFP6 指纹在内的不同特征组合,训练了六种机器学习模型,包括支持向量机、朴素贝叶斯、决策树、随机森林、逻辑回归和多层感知器。我们仔细研究和讨论了具有不同特征组合的训练机器学习模型的性能。据我们所知,这是第一篇关于 GPCR 变构调节剂多类分类的论文。我们相信,本研究中开发的分类模型可以用作发现和开发 GPCR 变构调节剂的简单而准确的工具。