Protein Structural Analysis and Design Lab, Department of Biochemistry and Molecular Biology, Michigan State University, 603 Wilson Road, East Lansing, MI 48824-1319, USA.
Department of Statistics, University of Wisconsin-Madison, Medical Science Center, 1300 University Avenue, Madison, WI 53706, USA.
Biomolecules. 2020 Mar 14;10(3):454. doi: 10.3390/biom10030454.
We show that machine learning can pinpoint features distinguishing inactive from active states in proteins, in particular identifying key ligand binding site flexibility transitions in GPCRs that are triggered by biologically active ligands. Our analysis was performed on the helical segments and loops in 18 inactive and 9 active class A G protein-coupled receptors (GPCRs). These three-dimensional (3D) structures were determined in complex with ligands. However, considering the flexible versus rigid state identified by graph-theoretic ProFlex rigidity analysis for each helix and loop segment with the ligand removed, followed by feature selection and k-nearest neighbor classification, was sufficient to identify four segments surrounding the ligand binding site whose flexibility/rigidity accurately predicts whether a GPCR is in an active or inactive state. GPCRs bound to inhibitors were similar in their pattern of flexible versus rigid regions, whereas agonist-bound GPCRs were more flexible and diverse. This new ligand-proximal flexibility signature of GPCR activity was identified without knowledge of the ligand binding mode or previously defined switch regions, while being adjacent to the known transmission switch. Following this proof of concept, the ProFlex flexibility analysis coupled with pattern recognition and activity classification may be useful for predicting whether newly designed ligands behave as activators or inhibitors in protein families in general, based on the pattern of flexibility they induce in the protein.
我们表明,机器学习可以精确定位区分蛋白质中无活性状态和活性状态的特征,特别是识别 G 蛋白偶联受体 (GPCR) 中关键配体结合位点灵活性转变的特征,这些转变是由生物活性配体触发的。我们的分析是在 18 个无活性和 9 个活性 A 类 G 蛋白偶联受体 (GPCR) 的螺旋片段和环上进行的。这些三维 (3D) 结构是在与配体复合物中确定的。然而,考虑到每个螺旋和环段在去除配体后通过图论 ProFlex 刚性分析确定的柔性与刚性状态,以及特征选择和 k-最近邻分类,足以识别四个围绕配体结合位点的片段,其灵活性/刚性准确预测 GPCR 处于活性或非活性状态。与抑制剂结合的 GPCR 在其柔性与刚性区域的模式上相似,而与激动剂结合的 GPCR 更具柔性和多样性。这种新的 GPCR 活性的配体近端灵活性特征是在不了解配体结合模式或先前定义的开关区域的情况下确定的,同时与已知的传递开关相邻。在这一概念验证之后,ProFlex 灵活性分析结合模式识别和活性分类,可能有助于预测新设计的配体在一般蛋白质家族中是作为激活剂还是抑制剂发挥作用,这取决于它们在蛋白质中诱导的灵活性模式。