Gupta Shivangi, Baudry Jerome, Menon Vineetha
Department of Computer Science, The University of Alabama in Huntsville, Huntsville, AL, United States.
Department of Biological Sciences, The University of Alabama in Huntsville, Huntsville, AL, United States.
Front Mol Biosci. 2023 Jan 12;9:953984. doi: 10.3389/fmolb.2022.953984. eCollection 2022.
This research introduces new machine learning and deep learning approaches, collectively referred to as Big Data analytics techniques that are unique to address the protein conformational selection mechanism for protein:ligands complexes. The novel Big Data analytics techniques presented in this work enables efficient data processing of a large number of protein:ligand complexes, and provides better identification of specific protein properties that are responsible for a high probability of correct prediction of protein:ligand binding. The GPCR proteins ADORA2A (Adenosine A2a Receptor), ADRB2 (Adrenoceptor Beta 2), OPRD1 (Opioid receptor Delta 1) and OPRK1 (Opioid Receptor Kappa 1) are examined in this study using Big Data analytics techniques, which can efficiently process a huge ensemble of protein conformations, and significantly enhance the prediction of binding protein conformation (i.e., the protein conformations that will be selected by the ligands for binding) about 10-38 times better than its random selection counterpart for protein conformation selection. In addition to providing a Big Data approach to the conformational selection mechanism, this also opens the door to the systematic identification of such "binding conformations" for proteins. The physico-chemical features that are useful in predicting the "binding conformations" are largely, but not entirely, shared among the test proteins, indicating that the biophysical properties that drive the conformation selection mechanism may, to an extent, be protein-specific for the protein properties used in this work.
本研究引入了新的机器学习和深度学习方法,统称为大数据分析技术,这些技术在解决蛋白质:配体复合物的蛋白质构象选择机制方面独具特色。这项工作中提出的新型大数据分析技术能够对大量蛋白质:配体复合物进行高效的数据处理,并能更好地识别特定的蛋白质特性,这些特性有助于提高蛋白质:配体结合预测的准确率。本研究使用大数据分析技术对GPCR蛋白ADORA2A(腺苷A2a受体)、ADRB2(肾上腺素能β2受体)、OPRD1(δ1阿片受体)和OPRK1(κ1阿片受体)进行了研究,该技术能够高效处理大量的蛋白质构象,并显著提高对结合蛋白构象(即配体将选择用于结合的蛋白质构象)的预测,比随机选择蛋白质构象的预测效果好约10至38倍。除了为构象选择机制提供一种大数据方法外,这也为系统识别蛋白质的此类“结合构象”打开了大门。在预测“结合构象”时有用的物理化学特征在很大程度上(但并非完全)在测试蛋白质之间是共享的,这表明驱动构象选择机制的生物物理特性在一定程度上可能因本研究中使用的蛋白质特性而异。