Yadav Prakarsh, Mollaei Parisa, Cao Zhonglin, Wang Yuyang, Barati Farimani Amir
Department of Mechanical Engineering, Carnegie Mellon University, USA.
Department of Biomedical Engineering, Carnegie Mellon University, USA.
Comput Struct Biotechnol J. 2022 May 18;20:2564-2573. doi: 10.1016/j.csbj.2022.05.016. eCollection 2022.
GPCRs are the target for one-third of the FDA-approved drugs, however; the development of new drug molecules targeting GPCRs is limited by the lack of mechanistic understanding of the GPCR structure-activity-function relationship. To modulate the GPCR activity with highly specific drugs and minimal side-effects, it is necessary to quantitatively describe the important structural features in the GPCR and correlate them to the activation state of GPCR. In this study, we developed 3 ML approaches to predict the conformation state of GPCR proteins. Additionally, we predict the activity level of GPCRs based on their structure. We leverage the unique advantages of each of the 3 ML approaches, interpretability of XGBoost, minimal feature engineering for 3D convolutional neural network, and graph representation of protein structure for graph neural network. By using these ML approaches, we are able to predict the activation state of GPCRs with high accuracy (91%-95%) and also predict the activation state of GPCRs with low error (MAE of 7.15-10.58). Furthermore, the interpretation of the ML approaches allows us to determine the importance of each of the features in distinguishing between the GPCRs conformations.
然而,G蛋白偶联受体(GPCRs)是三分之一获美国食品药品监督管理局(FDA)批准药物的靶点;针对GPCRs的新药物分子研发受到对GPCR结构-活性-功能关系缺乏机制理解的限制。为了用高度特异性药物且副作用最小地调节GPCR活性,有必要定量描述GPCR中的重要结构特征,并将它们与GPCR的激活状态相关联。在本研究中,我们开发了3种机器学习(ML)方法来预测GPCR蛋白的构象状态。此外,我们基于GPCR的结构预测其活性水平。我们利用这3种ML方法各自的独特优势,即XGBoost的可解释性、三维卷积神经网络的最少特征工程以及图神经网络的蛋白质结构图表示。通过使用这些ML方法,我们能够高精度(91%-95%)预测GPCR的激活状态,并且还能以低误差(平均绝对误差为7.15-10.58)预测GPCR的激活状态。此外,ML方法的解释使我们能够确定每个特征在区分GPCR构象中的重要性。